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HUMAN DNA SEQUENCES 
Background of the Invention 

Current methods for testing pharmacological substances rely on a three-stage testing 
approach to drug development. First, candidate compounds are typically screened in some 
sort of in vitro system, like inhibition of cancer cell growth. Candidates are then tested in 
an animal model, as a first approximation of systemic effects, including efficacy and 
toxicity. Compounds that still show promise after these initial in vivo screens, finally are 
tested in humans. Again, human testing typically occurs in three phases: toxicity; 
preliminary efficacy; and efficacy. The entire process can take more than a decade and cost 
hundreds of millions of dollars. Aside from the monetary costs and protracted time scale, 
moreover, current testing regimes waste the lives of countless laboratory animals and 
needlessly endanger the lives of human subjects. 

A need exists, therefore, for more sophisticated drug screening techniques that can 
be done rapidly in vitro. These screening techniques ideally will be reflective of systemic 
and/or organ-specific responses, so that they provide a reliable indicator of action in a 
human body. Current techniques, however, tend to utilize only a single or limited number 
of markers, thus answering only very simple questions that are of questionable medical 
import. For example, a typical in vitro assay may ask whether a lead compound binds a 
particular receptor, which has been implicated in a certain disorder. It is presumed that 
such binding is indicative of therapeutic usefulness, but it does not even purport to address 
systemic effects. 

Not only are screening techniques for efficacy inadequate, the available toxicity 
screens likewise are inadequate. Toxicity, on a first level, is usually measured by animal 
testing. Aside from the complications related to in vivo versus in vitro testing, such screens 
are insufficient because of differences in metabolism, uptake, etc., relative to humans. 
Thus, improved methods would be not only be in v/fro-based, they would also be more 
"human." 

With the increasing miniaturization of screening assays and the growing availability 
of targets for pharmaceutical intervention, there is increasing interest in developing arrays 
containing large numbers of these targets that can be assayed simultaneously. If such an 
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array contains a large enough population of targets, it can be used to essentially mimic the 
systemic response. In other words, the array becomes an in vitro surrogate for the human 
body. The more refined the array, the more accurate the predictive capability. In theory, 
an array could be constructed that can detect all of the known human expression products 
simultaneously, thereby, providing a very reliable indicator of the human response to a 
given compound. These arrays offer advantages over the present in vitro screening systems 
in that they can assay large numbers of responses simultaneously. They are superior to 
animal testing because they are more "human" and, thus, more predictive of human 
responses. 

In order to construct such arrays, however, the field is in need of further human 
targets. Advantageously, such targets will be provided with additional physiologically 
relevant information, such as whether the target is expressed in a particular tissue and 
whether it is related to a known functional class of targets. In this way, the artisan can 
focus as needed, for example, on tissue-specific effects or target class-specific effects, 
thereby providing information useful in evaluating efficacy and/or toxicity. 

In addition to a need for pharmacological screening targets, there is a need for 
further pharmacological substances. These substances can be used in the formulation of 
medicinal compositions and in treating a wide variety of disorders. 

The present invention responds to the aforementioned and other needs in the field by 
providing a population of novel targets useful, inter alia, in the profiling and medicinal 
contexts described above. 

Summary of the Invention 

It is an object of the invention, therefore, to provide a set of human cDNA clones. 
Further to this object, the invention provides sequences of human cDNA clones that were 
isolated from libraries generated from different human tissues. 

It is another object of the invention to provide assemblages of targets useful in 
profiling matrices for screening pharmacological test compounds. According to this object, 
assemblages comprising different populations of human nucleic acids, proteins and 
antibodies are provided. In different embodiments, cDNA library-specific assemblages and 
target-family-specific targets are provided. 
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It is a further object of the invention to provide a database of human nucleotide and 
protein sequences. Further to this object, novel human nucleotide and protein sequences 
are provided in electronic form. In one embodiment, one or more of these sequences is 
provided in a searchable database. 

It is still another object of the invention to provide biologically active target 
molecules useful in treating or detecting human disorders. Further to this object, the 
invention provides nucleic acid and protein molecules that have the capacity to affect 
disease etiology or symptoms or correlate with known disease states. Also further to this 
object, a database is provided which comprises the disclosed molecules in electronic form. 

It is still a further object of the invention to provide polypeptides encoded by the 
human cDNA clones disclosed herein. Further to this object, the invention provides 
antibodies and fragments thereof that are capable of binding to a specific portion of these 
polypeptides. 

It is yet another object of the invention to provide pharmaceutical compositions which 
comprise an effective amount of a pharmaceutical agent, wherein the pharmaceutical agent is 
selected from the group consisting of one or more polypeptides contemplated by the invention, 
variants or functional derivatives thereof, and antibodies thereto; and a physiologically 
acceptable carrier or excipient. 

It is still another object of the invention to provide expression vectors comprising one 
or more human cDN A clones disclosed herein or fragments thereof; and optionally a 
promoter operably linked to the cDNA clone or fragment thereof . Further to this object, the 
invention provides methodology for recombinantly producing a desired peptide, comprising 
expressing in a host cell a peptide encoded by a human cDNA clone disclosed herein. 

Detailed Description 

The invention results from a need in the art for new human nucleic acids and proteins. 
This need arises in several contexts. First, there is a need to identify targets for therapeutic 
intervention. Second, there is a need to identify molecules that may be adversely affected in a 
therapeutic context, thereby resulting in toxicity. Knowledge of these molecules will aid in 
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the design of new medicaments with enhanced efficacy and decreased toxicity. Finally, the 
need encompasses human nucleic acids and proteins that have medicinal applicability in their 
own right. 

In view of these needs, the present inventors set out to isolate and sequence human 
cDNAs from tissue-specific libraries. In this way, they represent subsets of molecules likely 
to be targets for therapeutic intervention or for avoiding toxicity. In addition, the inventors 
divided the molecules into various sub-categories, based on suspected functionality, structural 
similarity etc, which are of interest from a pharmacological perspective. These molecules are 
disclosed in provisional application serial nos. 60/149,499 and 60/156,503, filed August 18, 
1999, and September 28, 1999, respectively, both of which are hereby incorporated by 
reference in their entirety. 

GENERAL DESCRIPTION OF THE INVENTIVE MOLECULES 

The present invention provides novel polynucleotide molecules that, in some 
instances, have similarities with known molecules. The inventive DNAs were cloned from 
five different human cDNA libraries. In addition to these DNA molecules, the invention 
provides their protein translations and antibodies derived from them. The inventive DNA and 
protein sequences are show individually, below. The inventive nucleic acids also include the 
complements of these DNA sequences, as well as their RNA counterparts. Methods of 
producing the molecules also are provided. Further, the invention provides methods for 
detecting all or part of the molecules and of detecting polynucleotides encoding all or part of 
the molecules. 

The inventive molecules derive from five cDNA libraries: human fetal brain; human 
fetal kidney; human mammary carcinoma; human testis; and human uterus. For convenience, 
each sequence bears a designation that indicates from which library it is derived. In 
particular, these designations are: "hfpbr" for human fetal brain; "hfkd" for human fetal 
kidney; M hmcF for human mammary carcinoma; "htes" for human testis; and "hute" for 
human uterus. The individual libraries were constructed and screened as described below in 
the examples. 

The protein and DNA molecules of the invention are variously described herein as 
"target" molecules or "inventive" molecules. The sequences and other information pertinent 
to the nucleic acid and protein molecules of the invention are shown, below. 
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Interpreting the data disclosed with the Table and cDNA sequences, below: 

The table and data below provide the coding sequences of the inventive cDNAs as 
well as the protein sequences and other useful information, as set out below. 

Grouping 

The clones were assigned to the following fourteen functional and/or tissue-derived 

groups: 

1. Cell Cycle 

2. Cell Structure and Motility 

3 . Differentiation/Development 

4. Intracellular Transport and Trafficking 

5. Metabolism 

6. Nucleic Acid Management 

7. Signal Transduction 

8. Transmembrane Protein 

9. Transcription Factors 

10. Brain derived 

1 1 . Kidney derived 

12. Mammary Carcinoma derived 

13. Testes derived 

14. Uterus derived 

Description of Clone Files 

The individual clone files are structured in the same pattern. The Sections are 
separated by paragraphs. 

1. Clone Name 

The clone names are deciphered with reference to the following example: 
DKFZphfkd2_24e23, wherein the code represents: 

• producer of library ("DKFZ") (for convenience, this reference may be 
eliminated) 

• a "p" for "plasmid cDNA library" (for convenience, this reference may be 
eliminated) ' 

• library name (e.g. hfbr - human fetal brain; hfkd = human fetal kidney; hmcf = 
human mammary carcinoma; htes = human testes; hute = human uterus) 

• an underscore ("_") to separate library information from plate information 

• plate number (e.g. " 1 6") 

• plate coordinates (letter first; e.g. "fl4") 

2. Group 
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3. Introduction 

short review of the similarities, function of the protein and possible applications 

4. Short Information 

specifications about the cDNA (who sequenced, completeness of the cDNA, similarity, who 
sequenced, chromosomal localisation, length of cDNA, localisation of poly A tail and 
polyadenylation signal) 

5. cDNA-Sequence 

6. BLASTn Results 

search results of blasting the cDNA sequence against all public databases 

7. Medline Entries 

information about genes/proteins similar to the novel cDNA (if available) 

8. Putative Encoded Protein Information 

specifications about the encoded protein (ORF: length and localisation of the reading frame) 

9. Protein Sequence 

10. BLASTp Results 

search results of blasting the protein sequence against all public databases 

1 1 . Pedant Information 

output of fully automated annotation: summarises peptide information, homologies, patterns 
as follows: 

[Length] 

- length of the protein = number of amino acid residues 

[MW] 
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- molecular weight of the protein 

[pi] 

- isoelectric point 
[HOMOL] 

- shows protein with closest similarity to the cDNA-encoded protein 
[FUNCAT] 

- functional information according to a catalogue developed by Munich 
Information center for Protein Sequences (MIPS) 

[BLOCKS] 

- Blocks are multiply aligned ungapped segments corresponding to the most 
highly conserved regions of proteins. The blocks for the Blocks Database are made 
automatically by looking for the most highly conserved regions in groups of proteins 
documented in the Prosite Database. The Prosite pattern for a protein group is not 
used in any way to make the Blocks Database and the pattern may or may not be 
contained in one of the blocks representing a group. These blocks are then calibrated 
against the SWISS-PROT database to obtain a measure of the chance distribution of 
matches. It is these calibrated blocks that make up the Blocks Database. The WWW 
versions of the Prosite and SWISS-PROT Databases that are used on this server are 
located at the ExPASy World Wide Web (WWW) Molecular Biology Server of the 
Geneva University Hospital and the University of Geneva. World Wide Web URL 
http://blocks.fhcrc.org/blocks/about_blocks.html/ is the entry point to the database. 

- here Blocks segments found in the analysed protein sequences are displayed 
[SCOP] 

Nearly all proteins have structural similarities with other proteins and, in some 
of these cases, share a common evolutionary origin. The scop database provides a 
detailed and comprehensive description of the structural and evolutionary 
relationships between all proteins whose structure is known, including all entries in 
Brookhaven National Laboratory's Protein Data Bank (PDB). It is available as a set of 
tightly linked hypertext documents which make the large database comprehensible 
and accessible. In addition, the hypertext pages offer a panoply of representations of 
proteins, including links to PDB entries, sequences, references, images and interactive 
display systems. World Wide Web URL http://scop.mrc-lmb.cam.ac.uk/scop/ is the 
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entry point to the database. Existing automatic sequence and structure comparison 
tools cannot identify all structural and evolutionary relationships between proteins. 
The scop classification of proteins has been constructed manually by visual inspection 
and comparison of structures, but with the assistance of tools to make the task 
manageable and help provide generality. Proteins are classified to reflect both 
structural and evolutionary relatedness. Many levels exist in the hierarchy, but the 
principal levels are family, superfamily and fold. The exact position of boundaries 
between these levels are to some degree subjective. Scop evolutionary classification is 
generally conservative: where any doubt about relatedness exists, we made new 
divisions at the family and superfamily levels. 

- - here SCOPE segments found in the analysed protein sequences are 
displayed 

[EC] 

ENZYME is a repository of information relative to the nomenclature of 
enzymes. It is primarily based on the recommendations of the Nomenclature 
Committee of the International Union of Biochemistry and Molecular Biology 
(IUBMB) and it describes each type of characterized enzyme for which an EC 
(Enzyme Commission) number has been provided. World Wide Web URL 
http://www.expasy.ch/enzyme/ is the entry point to the database. 

- here EC-number and name of enzymes with similarity to the analysed protein 
sequences are displayed 

[PIRKW] 

- functional information according to the Protein Information Resource (PIR) 
database catalogue developed by Munich Information Center for Protein Sequences 
(MIPS), the National Biomedical Research Foundation (NBRF) and the International 
Protein Information Database in Japan (JIPID). 

[SUPFAM] 

- information according to the Protein Information Resource (PIR) database 
catalogue of protein superfamilies developed by Munich Information Center for 
Protein Sequences (MIPS), the National Biomedical Research Foundation (NBRF) 
and the International Protein Information Database in Japan (JIPID). 
[PROSITE] 
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please refer to 12. PROSITE Motifs 
[PFAM] 

please refer to 13. PFAM Motifs 

[KW] 

- overall 2dimensional folding information 

- 3D indicates that the proteins is similar to a protein of which a 3 dimensional 
structure is known 

- overall structural information 

□ 

The last PEDANT-block depicts information about the folding structure of the 
protein generated by PREDATOR. PREDATOR is a secondary structure prediction 
program. It takes as input a single protein sequence to be predicted and can optimally 
use a set of unaligned sequences as additional information to predict the query 
sequence. The mean prediction accuracy of PREDATOR is 68% for a single sequence 
and 75% for a set of related sequences. PREDATOR does not use multiple sequence 
alignment. Instead, it relies on careful pairwise local alignments of the sequences in 
the set with the query sequence to be predicted. 

World Wide Web URL http://www.embl- 
heidelberg.de/argos/predator/predator_info.html is the entry point to the database. 

- H = helix, E = extended or sheet, _ = coil, T = transmembrane, B = beta 

- x indicates a low-complexity region with repeat-like structure which is 
omitted in all BLAST searches 

12. PROSITE Motifs 

PROSITE is a database of protein families and domains. It consists of biologically significant 
sites, patterns and profiles that help to reliably identify to which known protein family (if 
any) a new sequence belongs. World Wide Web URL http://www.expasy.ch/prosite/ is the 
entry point to the database. A description of the prosite consensus patterns is also provided, 
below. 

13. PFAM Motifs 

PFAM (protein families) is a large collection of multiple sequence alignments and hidden 
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Markov models covering many common protein domains. World Wide Web URL 
http://www.sanger.ac.uk/Pfam/ is the entry point to the database. 



Deposit of Clones 

Clones were deposited as a pool with the American Type Culture Collection under 



polynucleotide is obtainable. Each clone has been transfected into separate bacterial cells (E. 
coli) in this composite deposit. 

The clones may also be obtained from the Resource Center of the German Human 
Genome Project (Heubner Weg 6, 14059 Berlin, GERMANY). The Resource Center library 
numbers are slightly different that those presented here, but may be readily obtained by the 
following key or with the assistance of Resource Center personnel. 

The library name becomes a number: brain (hfbr2) becomes 564; kidney (hfkd2) 
becomes 566; mammary carcinoma (hmcfl) becomes 727; testis (htes3) becomes 434;and 
uterus (hutel) becomes 586. Next, the plate number is converted to two digits (e.g., "2" 
becomes "02") and is moved behind the plate coordinate, and the underscore is dropped. The 
following examples are helpful: 

Listed Number Resource Center Number 



The libraries were constructed using two commercially available vectors. The brain 
(hfbr2 designations) and kidney (hfkd2 designations) libraries utilize pAMP 1 from Life 
Technologies and are maintained in XL-2Blue (Strategene); the uterus (hutel), testes (htes3) 
and mammary carcinoma (hmcfl) libraries are constructed in pSPORTl, also from Life 
Technologies, and are maintained in DH10B (LifeTechnologies). In addition to the following 
techniques, consultation with the commercial literature available on these clones will make 
evident all of the housekeeping techniques needed to propagate and isolate the individual 
constructs. All inserts may be excised with a Notl/Sall digestion. Alternatively, universal 
primers, flanking the cloning region, may be used to amplify the inserts using PCR methods. 



accession number 



, from which each clone comprising a particular 



DKFZphfbr2_16f21 

DKFZphfkd2_lj9 

DKFZphmcfl_lc23 

DKFZphtes3_14g5 

DKFZphutel_17k7 



DKFZp564F2116 

DKFZp566J091 

DKFZp727C231 

DKFZp434G0514 

DKFZp586K0717 
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Bacterial cells containing a particular clone can be obtained from the composite 
deposit as follows: 

An oligonucleotide probe or probes should be designed to the sequence that is known 
for that particular clone. This sequence can be derived from the sequences provided herein, 
or from a combination of those sequences. Methods of probe design are presented below. 

Oligonucleotide probes may be labeled with y- 32 P ATP (specific activity 6000 
Ci/mmole) and T4 polynucleotide kinase using commonly employed techniques for labeling 
oligonucleotides. Other, non-radioactive labeling techniques can also be used. 
Unincorporated label typically is removed by gel filtration chromatography or other 
established methods. The amount of radioactivity incorporated into the probe can be 
quantified by measurement in a scintillation counter. Preferably, specific activity of the 
resulting probe generally should be approximately 4X10 6 dmp/pmole. 

The bacterial culture containing the pool of full-length clones should preferably be 
thawed and 100 nl of the stock used to inoculate a sterile culture flask containing 25 ml of 
sterile L-broth containing ampicillin at 50- 100 |ig/ml (for XL-2Blue strains 25 |ig/ml 
tetracycline should also be used). The culture should preferably be grown to saturation at 
37°C, and the saturated culture should preferably be diluted in fresh L-broth. Aliquots of 
these dilutions should preferably be plated to determine the dilution and volume which will 
yield approximately 5000 distinct and well-separated colonies on solid bacteriological media 
containing L-broth containing ampicillin at 100 ng/ml (for XL-2Blue strains 25 ng/ml 
tetracycline should also be used)and agar at 1 .5% in a 150 mm petri dish when grown 
overnight at 37°C. Other known methods of obtaining distinct, well-separated colonies can 
also be employed. 

Standard colony hybridization procedures should then be used to transfer the colonies 
to nitrocellulose filters and lyse, denature and bake them. The filter is then preferably 
incubated at 65°C. for 1 hour with gentle agitation in 6 x SSC (20 x stock is 1 75.3 g 
NaCl/liter, 88.2 g Na citrate/liter, adjusted to pH 7.0 with NaOH) containing 0.5% SDS, 100 
^ig/ml of yeast RNA, and 10 mM EDTA (approximately 10 mL per 150 mm filter). 
Preferably, the probe is then added to the hybridization mix at a concentration greater than or 
equal to 1X10 6 dpm/mL. The filter is then preferably incubated at 65°C. with gentle agitation 
overnight. The filter is then preferably washed in 500 mL of 2 x SSC/0.5% SDS at room 
temperature without agitation, preferably followed by 500 mL of 2 x SSC/0.1% SDS at room 
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temperature with gentle shaking for 15 minutes. A third wash with 0.1 x SSC/0.5% SDS at 
65°C. for 30 minutes to 1 hour is optional. The filter is then preferably dried and subjected to 
autoradiography for sufficient time to visualize the positives on the X-ray film. Other known 
hybridization methods can also be employed. 

The positive colonies are picked, grown in culture, and plasmid DNA isolated using 
standard procedures. The clones can then be verified by restriction analysis, hybridization 
analysis, or DNA sequencing. 

Alternatively, clones may be grown as described above, and PCR used to isolate the 
insert DNAs. Methods of PCR are described below and are otherwise well known . 

ERROR SCREENING 

The DNA sequences found herein derive from individual clones, which are publicly 
available, as noted above. Thus, the skilled artisan will recognize that any specific sequence 
disclosed herein readily can be screened for errors by resequencing a particular fragment, in 
both directions (i.e., by sequencing both strands). Alternatively, error screening can be 
performed by amplifying and/or cloning any of the inventive DNAs, using for example RT- 
PCR, and sequencing the resulting amplified product. In the event that there is a sequencing 
error, reference should be made to the deposited clone as the correct sequence. 

USES AND BIOLOGICAL ACTIVITIES OF THE INVENTIVE MOLECULES 

The inventive molecules and their derivatives are susceptible to a wide variety of uses, 
based on functional and/or structural properties. The skilled worker will appreciate, based on 
the biological activities detailed below, and discussed with regard to the individual sequences 
disclosed below, that the inventive molecules will find usefulness in numerous therapeutic and 
diagnostic applications. 

The DNA molecules, especially the potassium salts thereof, can be used as fertilizer 
supplements due to their high nitrogen and phosphorus contents. Since the DNAs are of 
defined length, they are also useful in gel electrophoresis as molecular weight markers. Due 
to their similarity with known molecules, certain of the DNA molecules and their variants and 
derivatives may be used in any number of different diagnostic procedures and therapeutic 
applications. They may also be used to make the encoded proteins. 
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The proteins themselves have many possible uses. They may be used as a nutritional 
supplement for humans, animals and even for laboratory use as, for example, medium for 
bacterial cultures. Moreover, since the proteins are of defined, known sizes, they may be 
used as molecular weight markers for gel electrophoresis and gel filtration. Because they are 
of defined sequences, they also have use in microsequencing and protein fingerprinting 
applications. 

Expression Profiling Applications 

Given their known tissue expression and functional associations, assemblages of the 
inventive proteins (or corresponding antibodies) and nucleic acids are particularly suited to 
expression profiling applications. Expression profiling generally entails constructing an array 
of indicators that signal the presence of a particular RNA or protein expression product. Such 
arrays can be used to evaluate, for example, pharmacological effectiveness and toxicity. In 
particular, expression profiles from such arrays can be generated from cells treated with 
known compounds, having known properties, and these profiles can be compared to profiles 
of unknowns to evaluate similarities and differences, which can be correlated with efficacy or 
toxicity. 

Additional uses of profiling include diagnosis, tracking development, and ascertaining 
signaling and metabolic pathways. For examples of references describing profiling and its 
uses, see Farr et aL, U.S. Patent 5,811,231 (1998); Seilhamer^ a/., U.S. Patent 5,840,484 

(1998) ; Rine et aL, U.S. Patent No. 5,777,888 (1998); WO 97/27317; WO 99/05323; WO 
99/09218; and WO 99/14369. For a device for implementing such techniques, see Lipshutz 
et aL, U.S. Patent No. 5,856,174 (1999) and Anderson et al., U.S. Patent No. 5,922,591 

(1999) . 

In one embodiment, a subset of the inventive DNAs will be arrayed on a substrate, 
like a gene chip, a filter or a 96-well plate. Test samples containing cells are maintained in 
the presence of a label capable of incorporation into nascent mRNA. Samples are treated with 
test and control compounds, which will induce mRNA expression in the sample, resulting in 
incorporation of label. Whole mRNA is isolated and applied to the array such that it 
hybridizes with the DNAs contained therein. After washing, the amount of hybridization is 
quantified and a profile is generated. These steps are repeated with various control and test 
compounds, thereby generating a library of profiles, which can be used to ascertain the 
relationships relevant to pharmacological efficacy or toxicity. 
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The matrices used in such profiling, however, need not be limited to those utilizing 
DNAs. Rather, other nucleic acids, like RNAs and protein nucleic acids (PNAs), as well as 
the inventive proteins and antibodies corresponding to the inventive proteins may also be 
employed. Hence, for example, antibodies could form the array and the samples could be 
treated in order to label nascent proteins. Whole proteins then would be isolated and applied 
to the antibody matrix. Developing the resulting signal would result in a protein expression 
profile, which is useful in essentially the same manner as the nucleic acid profile. A protein 
matrix could be used, for example, in evaluating antibody responses to pharmaceutical agents 
in order to eliminate possible cross-reactivity. 

Moreover, where nucleic acids are used in the matrix, it is often beneficial to use 
variants (as defined below) of the molecules described herein. This can be used to account 
for genetic variations that are of little or no consequence to the function of the resultant gene 
product. Hence, they can account for wobble or conservative amino acid variations that do 
not perturb function, like variations in some of the protein motifs elucidated below. Thus, 
each position in the matrix can employ multiple nucleic acid probes that account for a series 
of variants. 

Expression profiling may also be done, in another embodiment, using two- 
dimensional protein gels in which the inventive proteins are detected. The resultant profiles 
can be used in the same way as described. 

Matrices useful for profiling may be constructed based on different criteria. Of 

course, the more relevant profiles will take into account expression of most human genes, 

preferably all of them. In certain situations, however, it is advantageous to look at a smaller 

subset. For example, if one were concerned about fetal neural toxicity, a fetal brain-specific 

matrix might be chosen. On the other hand, if one were interested in targeting mammary 

carcinoma tissue, a corresponding matrix could be used. Thus, matrices may be constructed 

using all of the sequences available from a tissue-specific library. 

* * * 

The following discussion relates to some of the various functional and structural 
groupings that would be of interest to the artisan wishing to construct profiling matrices. 
Of course, the artisan will also recognized that these functional descriptions may find 
additional applicability in the therapeutic and diagnostic applications discussed below. 
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Cell Cycle 

A proliferating cell must coordinate replication and chromosomal separation to ensure 
that the genome is replicated completely, and that a single copy is correctly inherited by each 
daughter cell. The cell cycle is the coordinated series of events that achieves these aims. 
Many of the key events are initiated by a family of conserved Seiren/threonine protein 
kinases, the cyclin-dependent kinases (CDKs), that are activated by the cyclin family of 
proteins (cyclins A-H). In turn, the cyclin-CDK complexes are modulated by other protein 
kinases or phosphatases, and by binding specific inhibitor proteins. The enormous variety of 
ways in which CDK activity can be regulated allows the cell to respond to internal signals 
generated by preceding events in the cell cycle and to external growth signals. 

The somatic cell cycle is divided into four phases: DNA replication (S phase) and 
chromosome separation (M phase) are separated by gap phases (Gl and G2). At specific 
control points the decision to begin the next stage (DNA synthesis or mitosis) is carefully 
regulated. 

Cdc2, the primary kinase, is especially required for the Gl-S transition and S phase. 
Cdc4 and Cdc6 are involved at the restriction point, where the cell can decide to proliferate or 
arrest (G1<->G0) and Cdc7 is a CDK activating kinase (CAK) as well as a subunit of TFIIH. 

The Cyclin-CDK complexes are regulated in various ways. One is through 
phosphorylation by CDK activating kinases (CAK), like the Y15 kinase (Weel) and 
dephosphorylation by CDK associated phosphatases (CAP), like Cdc25A a member of the 
Cdc25 family (Cdc25A, B and C). 

An other way of regulation occurs through two classes of CDK inhibitors (CKI), the 
INK4 proteins pi 5, pl6, pl8, and pl9, who negatively regulates the cyclin D CDK 
complexes and second the p21 family with p21, p27, and p57. 

The cell cycle is also regulated through ubiquitin-mediated proteolysis involving the 
destruction of both cyclins and CDK inhibitors by the 26S proteasome, that requires an 
ubiquitin conjugating enzyme (UBC) and an ubiquitin ligase. The instability is conferred by 
PEST regions (cyclin D and E) or a ten amino acid region in the amino terminus (degradation 
box) in the A- and B-type cyclins. 
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All these modifications play an important role for the cellular localization, because 
only the nuclear CDK-cyclin complexes are functional for cell cycle. During Gl phase of the 
cell cycle, cyclines A, E and D are synthesized and bind to their cyclin-dependent kinase 
(CDK) partners. CDK complexes containing cyclins A, E and Dl are then imported into and 
concentrated within nuclei. Cdk6- cyclin D3 has been localized to both cytoplasmic and 
nuclear compartments, although only the nuclear complex is active. As cells enter S phase, 
cyclin A and cyclin E complexes remain within the nucleus, whereas cyclin Dl relocalizes to 
the cytoplasm for proteolysis at the onset of S phase. Like Cdk2-cyclin A, Cdc2-cyclin A is 
nuclear and remains so until it is degraded during mitosis. By contrast, as a result of ongoing 
nuclear import and more rapid re-export, cyclin Bl, which binds to Cdc2 upon synthesis 
during S phase, is predominantly cytoplasmic. Cdc2 -cyclin B2 is also cytoplasmic, although 
this might occur through anchoring of the complex to some cytoplasmic constituent. At 
prophase, phosphorylation of cyclin Bl promotes accumulation of Cdc2 -cyclin Bl in the 
nucleus, whereas cyclin B2 remains in the cytoplasm until nuclear envelope breakdown. 

Two crucial regulators of Cdc2-cyclin B-Weel and Cdc25C exist and are responsible 
for the G2 to M control point. Weel is a nuclear protein throughout the cell cycle, whereas 
Cdc25C binds to 14-3-3 proteins during interphase and remains predominantly cytoplasmic. 
In some systems Cdc25C, like cyclin Bl, rushes precipitously into the nucleus just before 
entry into mitosis. 

The 1 10-kDa retinoblastoma (tumor suppressor) protein (RB), a pRB-family member 
is an important regulator of cell-cycle progression and differentiation. Like the E2F family 
(E2F1-5) or DP family (DP 1-3) of transcription activators, RB suppresses inappropriate 
proliferation by arresting cells in Gl by repressing the transcription of genes required for the 
transition into S phase. Before the cell proceeds into S phase, RB becomes phosphorylated at 
multiple sites by the cyclin dependent protein kinases (CDKs) and loses its transcriptional 
repressing activity. Phosphorylation of RB during late Gl phase results in the dissociation of 
the E2F-RB repressor complex which allows S-phase specific genes to be transcribed. Cyclin 
E is the evolutionary conserved target for E2F and interacts together with CDC2 in late Gl . 

For a proliferating cell it is vital that only undamaged DNA is replicated because if 
DNA damage is substantial, its replication can lead to chromosome loss or rearrangement. 
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Thus, we find a GK->S checkpoint in late Gl that requires tumor suppressor p53. A p53- 
dependent Gl arrest is effected by the cyclin dependent kinase inhibitor p21 through higher 
expression levels that inhibits almost all cyclin CDK complexes. 

The kinase responsible for phosphorylating the unidentified kinetochore component 
in metaphase may be a member of the MAP kinase family and appears to be the proto 
oncogene c-MOS, a cytostatic factor (CSF) in meiosis. 

Several categories of proteins are coded for by clones of the invention within the 
overall group of "Cell cycle"and include, among others, the following: 

Tumor suppressors (e.g. N33V Tumour-suppressor genes are known to be involved in 
the control of cell growth and division, interacting with proteins which control the cell cycle. 
The N33 gene is significantly methylated in tumour cells, a mechanism by which tumor- 
suppressor genes are inactivated in cancer. The N33 gene has been reported by OMIN OMIN 
(Online Mendelian Inheritance in Man at http://www.ncbi.nlm.nih.gov/htbin-post/Omin) to 
be associated (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with the 
following diseases: 1) prostate cancer suppression (OMIN *601 385). Clones in this category 
include: fbr2_2kl4. 

C-TAK1 Cdc25c associated protein kinase : Cdc25C is a protein kinase that controls 
entry into mitosis by dephosphorylation of Cdc2. Cdc25C function is regulated by 
phosphorylation, too. Serine 216 phosphorylation of Cdc25C mediates the binding of 14-3-3 
protein to Cdc25C. C-TAK1 (Cdc twenty-five C associated protein kinase) phosphorylates 
Cdc25C on serine 216 in vitro. Alterations in the gene coding for the above protein kinase 
has been reported by OMIN to be associated (as potentially diagnostic, therapeutic, causative, 
and/or related, etc. . .) with Pancreatic cancer (OMIN *60278). Clones in this category 
include: tes3J7j3. 

Cell structure and motility 

One of the major differences between prokaryotes and eukaryotes is the ability of the 
eukaryotic cell to adopt very different shapes dependent on its function during the 
differentiation process. Animal cells vary from being round to extended cylindric forms like 
motorneurons or muscle cells. In humans, more than 100 different cell types can be 
distinguished, each having a characteristic shape. The form of a cell often is closely related to 
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its capacity to move. Some completely differentiated cells like fibroblasts can still change 
their form actively, thereby migrating. Other cell types serve as motor elements - 
"macroscopically" like muscle cells or "microscopically" like ciliated epithelia. Such tasks 
are fulfilled by a big class of proteins; on the one hand responsible for maintenance of cell 
structure and contacting neighbor cells or the intercellular matrix and on the other hand for 
cell motility. These topics cannot be regarded separately: The motility apparatus e.g. must be 
fixed in the cytoskeleton. Three different types of filaments can be distinguished: Actin 
filaments, tubulin filaments and intermediate filaments, each present in almost all types of 
cells. 

Actin filaments (F-actin) are built up of monomers (G-Actin). In muscle cells, actin, 
myosin, for both of which several paralogous genes are known, as well as many more 
proteins are constituents of the contractile apparatus. 

The "thin" and "thick filaments" in a muscle cell consist mainly of actin and myosin, 
respectively. 

Several different proteins are responsible for the anchoring of the actin filaments in 
the Z-disks (e.g. alpha-actinin and desmin) or at the end of the myofibers in the cell 
membrane. 

Troponin I, -C, -T and Tropomyosin - associated with actin - confer the Ca++- 
dependent triggering of contraction. 

Length of the sarcomere is controlled by the giant protein titin. 

In smooth muscle, there is no troponin. Contraction activity is controlled by 
phosphorylation / dephosphorylation of myosin by a specialized kinase instead. Contractile 
fibers are not organized in sarcomeres. 

Apart from contributing to muscle contraction, the actomyosin system is responsible 
for many other motions at cellular level, e.g. the amoeboid movement of pseudopodia or the 
fission of cells at the end of mitosis by a contractile ring. 

Besides this, actin fibers fulfill structural tasks like maintenance of the shape of 
stereocilia or microvilli. Here, actin filaments are connected by proteins like fimbrin. But not 
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only specialized structures like the mentioned ones contain actin fibers. There is a network 
covering the complete cell volume with F-actin as a major constituent. Whereas the actin 
filaments in the structures mentioned above are relatively stable, this F-actin is highly 
dynamic. Management of the network structure and turnover is achieved by connecting 
proteins like alpha-actinin, fimbrin or fill-in; turnover is regulated by gelsolin, villin, and 
different capping- and fragmentation-proteins. 

Microtubules are built up of alpha-beta tubulin heterodimers. Turnover of filaments is 
achieved by building-in and releasing of monomers with different time constant rates at both 
ends. The resulting cycle is called "treadmilling" Thirteen strings of tubulin duplets build up 
one subfiber, whereas one fiber contains two or three of those. A complete axoneme consists 
of 9 radial and 2 central fibers. This "9+2" - structure is the basis both of flagella, their basal 
bodies and centrioles. In flagella, several additional structures like radial elements exist. 
Nexin connects the fibers and dyneine is the motor ATPase which shifts the fibers relative to 
each other. Several genetic diseases like the Cartageneric syndrome are caused by 
deficiencies of distinct proteins in cilia. 

Besides this, microtubules are abundant in all types of cells. They are part of a 
delivery system for organelles, e.g. in the golgi apparatus. A further very important system 
based on microtubules is the mitotic spindle, it is organized by the centrosomes. Besides 
many other components, the major part of a centrosome are two centrioles which are built up 
of nine microtubule-triplets. Most remarkably, new centrioles are not synthesized de novo but 
generated by duplication of old ones. 

Cytoplasmic microtubules are associated with many different proteins. Two major 
classes are known: The MAPs ("microtubule-associated proteins", with molecular masses 
between 200 and 300 kD) and the much smaller tau-Proteins with a MW between 60 and 70 
kD. These proteins regulate the treadmill-process and the interaction with other structures in 
the cell. 

Besides actin and myosin the so-called intermediate filaments constitute a third class 
of filaments. In contrast to the former two groups, they do not participate in motility, nor are 
they dynamic structures subject to a vivid turnover. The most important ones are 
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neurofilaments (in neurons), keratin filaments (mainly in epithelial cells), and vimentin 
filaments (in many sorts different cell types). 



The biological function of both the cytoskeleton as well as contractile apparatus of a 
cell does not end at the cell membrane. Cells must be embedded in the extracellular matrix, 
all cells of a muscle must act as one single mechanical unit and epithelia must resist 
macroscopic mechanical forces. Hence, cell adhesion and the extracellular matrix are closely 
connected to the cytoskeleton. Vincullin is one of the proteins which serve as an anchor for 
intracellular fibers (actin). Different types of desmosomes and tight junctions connect 
neighbor cells with intercellular fibers. On the inside, cytoplasmic plaques connect them to 
the cytoskeleton. These structures, on the one hand, serve as mechanical elements whereas 
gap junctions, on the other hand, connect cells metabolically. 

The extracellular matrix consists of a network of proteins, glycoproteins and 
polysaccharides. Different proteins are present in relation to different mechanical demands:. 
Elastin is found in tissues with high elasticity (lungs, heart) whereas collagen, a more hard- 
wearing protein, is found in tendons and ligaments. Fibronectin is an extracellular protein 
highly important for cell adhesion. 

Reference: Murray J et al (1 992): Cell Motil Cytoskeleton 22: 2 1 1-223. 

Within the overall group of Cell Structure and Motility several categories of proteins 
are coded for by clones of the invention: 

Collagen alpha chain proteins : Proteins with the typical (xxG)n repeat of collagen 
proteins and Pfam von Willebrand factor type A domain(s) suggest they are collagen alpha 
chains. These proteins can find application in modulation of connective tissue, bone and 
cartilage development and maintainance. OMIN reports collagen alpha chains have 
associations (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with the 
following diseases: 1) Osteogenesis imperfecta, type I (OMIN #166200); 2) Osteogenesis 
imperfecta congenita (OMIN #166210); 3) Alport Syndrome, X-linked (OMIN #301050); 4) 
Thrombastenia of Glanzmann and Naegeli (OMIN *273800); 5) Ehlers-Danlos Syndrome, 
Type VII (OMIN #130060); 6) Marfan Syndrome (OMIN #154700); 7) Alport Syndrome, 
Autosomal Recessive (OMIN #203780); 8) Alpha-2-Deficient Collagen Disease (OMIN 
203760); 9) Goodpasture Syndrome (Omin 233450); 10) Osteogenesis Imperfecta, 
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progressively deforming, with normal sclerae (OMIN #259420); 11)) Ehlers-Danlos 
Syndrome, Type VII Autosomal Recessive (OMIN *225410); and 12) ) Osteogenesis 
imperfecta, Type IV (OMIN #166220). OMIN reports that von Willebrand factor type A 
domains have associations (as potentially diagnostic, therapeutic, causative, and/or related, 
etc..) with the following diseases:: 1) Hemophilia A (OMIN *306700); 2) Von Willebrand 
Disease (OMIN *193400); 3) Giant Platelet Syndrome (OMIN *231200); 4) Thrombastenia 
of Glanzmann andNaegeli (OMIN *273800); 5) Congenital Thrombotic Diseasae due to 
protein C deficiency (OMIN #176860); 6) Polycystic Kidney Disease 1 (OMIN *601313); 7) 
Nephrogenic Diabetes Insipidus (OMIN *304800); 8) Factor V Deficiency (OMIN *227400); 
and 9) Dentatorubral-Pallidoluysian Atrophy (Omin * 125370). Clones in this category 
include: fbr2_2b5. 

Radial spokehead protein: Radial spokehead proteins, e.g., Chlamydomonas 
reinhardtii radial spokehead protein of flagella or axoneme and the Strongylocentrotus 
purpuratus sea urchin spermatozoa protein p63, and human proteins with similarity thereto 
are important for the maintenance of a planar form of sperm flagellar beating. The human 
protein(s) can find application in modulating the structure of the human spermatozoa radial 
spoke head and modulation of sperm motility in men (e.g., in sterility). Clones in this 
category include: tes3_15i5. 

Ankvrins : Ankyrins are peripheral membrane proteins which interconnect integral 
proteins with the spectrin-based membrane skeleton. Thus these proteins are involved in 
coupling of cyto skeleton and cell membrane. OMIN reports that Ankyrins have associations 
(as potentially diagnostic, therapeutic, causative, and/or related, etc.. .) with the following 
diseases: 1) Heriditary Spherocytosis (OMIN * 1 82900); 2) Hemolytic Poikilocytic Anemia 
due to reduced ankyrin binding sites (OMIN 141700); 3) Atypical Elliptocytosis (OMIN 
225450); 4) Autosomal recessive spherocystosis (OMIN #270970); 5) Werner Syndrome 
(OMIN *277700); and 6) Rhesus-unlinked type Elliptocytosis (OMIN #130600). Clones in 
this category include: tes3_1817. 

FGD1 -related F-actin binding protein (Farbin/FGDn : FGD1 -related F-actin-binding 
protein (Farbin/FGDl) is a novel F-actin-binding protein. The gene locus fgdl seems to be 
responsible for faciogenital dysplasia or Aarskog-Scott syndrome. (OMIN 305400). Frabin 
binds F-actin and shows F-actin-cross-linking activity. Overexpression of frabin in Swiss 3T3 
cells and COS7 cells induces cell shape change and c-Jun N-terminal kinase activation, as 
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described for FGD1. Because FGD1 has been shown to serve as a GDP/GTP exchange 
protein for Cdc42 small G protein, it is likely that frabin is a direct linker between Cdc42 and 
the actin cytoskeleton. Cdc42p is an esin yeast, Cdc42p transduces signals to the actin 
cytoskeleton to initiate and maintain polarized growth and to mitogen-activated protein 
morphogenesis. In mammalian cells, Cdc42p regulates a variety of actin-dependent events 
and induces the JNK/SAPK protein kinase cascade, which leads to the activation of 
transcription factors within the nucleus. Clones in this category include: tes3_72kl5. 

Paramvosins : Paramyosin is a major structural component of thick filaments and 
invertebrate muscle. Paramyosins are promising antigens for immunization against several 
parasites, such as Schistosoma mansoni. Clones in this category include: tes3_7b22. 

Tuftelin : Tuftelin/enamelin are matrix proteins of the teeth. As other proteins involved 
in calcification, these proteins are also expressed in the uterus matrix. The new protein can 
find application in modulation of tissue-calcification, especially the uterus. As reported by 
OMIN, tuftelin has been associated (as potentially diagnostic, therapeutic, causative, and/or 
related, etc. . .) with amelogenesis imperfecta (OMIN *600087). Clones in this category 
include: utel_19g22. 

Cell Adhesion Regulator (CAR1V CAR1 is involved in the regulation of cell-cell 
adhesion. OMIN reports the association (as potentially diagnostic, therapeutic, causative, 
and/or related, etc..) of CAR1 with tumor suppression by the reduction of tumor invasion 
(OMIN *1 16935). Clones in this category include: utel_24j6. 

Differentiation/Development 

Almost every multicellular organism originates from meiotic cell divisions and the 
recombination of a paternal and a maternal set of chromosomes. After fertilization of the egg, 
all cells of a body originate from this one cell. Thus the cells of the developing body are 
initially genetically alike. But phenotypically they become very different. They are 
specialized to a certain cell type and arranged in an organized pattern to a certain type of 
tissue and the whole structure has the well-defined shape of an organ. All these features are 
determined by the DNA sequence of the genome, which is reproduced in every cell. Each cell 
acts on the genetic instructions given to a certain time and at a certain place of development 
and plays its individual part in the multicellular organism. Cell differentiation may be divided 
into three general steps: cell cycle exit, apoptosis protection and tissue specific gene 
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expression. These processes are coordinated to provide the final and unique tissue 
characteristics. 



An animal cell that has achieved a certain level of development is said to be 
determined. This differentiation of a cell may be irreversible and in that case the cell may be 
renewed only by simple duplication. Other cells are renewed by means of stem cells which 
are immortal ( e.g. stem cells of the bone marrow, epidermal stem cells). The genetic control 
of development is extensively studied in non-vertebrates and vertebrates. The classical animal 
model is the fruit fly Drosophilia and the modern model is the transgenic mouse. Animal 
transgenesis has proven to be useful for physiological as well as physiopathological studies. 
Besides the approach based on the random integration of a DNA construct in the mouse 
genome, gene targeting can be achieved using totipotent embryonic stem cells for targeted 
transgenesis. Transgenic mice are than derived from the embryonic stem cells. This allows 
the introduction of null mutations in the genome (so-called knock-out) or the control of the 
transgene expression by the endogeneous regulatory sequence of the gene of interest (so- 
called knock-in). Mice can be created that express wild-type genes, mutant genes, marker 
genes or cell lethal genes in a tissue specific manner. These animal models allow to follow 
changes in tissue and organ development and lead to a better understanding of the cellular 
function of many genes or to the generation of animal models for human diseases. 
Fundamental problems in immunology, onset and development of cancer, regulation in fatty 
acid metabolism, aspects of cardiovascular function, control of the central nervous system 
development, analysis of reproductive development and function are only some examples of 
research interests. 

The final stage of cell differentiation is growth arrest. In animal tissues with rapid cell 
turnover terminally differentiated cells undergo programmed cell death. The cells have the 
ability to kill themselves by activating an intrinsic cell suicide program when they are no 
longer needed or have become seriously damaged. The execution of this program is termed 
apoptosis. Apoptosis is of importance for development and homeostasis of animals. The key 
components of this program have been conserved in evolution from worms (C. elegans) to 
insects (Drosophilia) to humans. The roles of apoptosis include the sculpting of structures 
during development, deletion of unneeded cells and tissues, regulation of growth and cell 
number, and the elimination of abnormal and potentially dangerous cells. In this way 



23 



WO 01/12659 PCT/IB00/01496 
apoptosis provides "quality control mechanism" that limits the accumulation of harmful cells, 
such as virus-infected cells and tumor cells. On the other hand inappropriate apoptosis is 
associated with a wide variety of diseases, including AIDS, neuro-degenerative disorders and 
ischemic stroke. Because it is now clear that apoptosis is a result of an active, gene-directed 
process, it should be eventually possible to manipulate this form of cell death by developing 
drugs that interact with its recently identified mechanisms of action. Inducers of cell 
differentiation, cell cycle arrest and apoptosis might be the novel molecular targets for new 
anticancer agents in addition to the signaling pathways for growth factors and cytokines. 

Proteins, factors, receptors and genes of importance in apoptosis : 

Proteases: 

- Calpain, an intracellular cysteine protease, exact role unknown. 

- Caspase-1 to Caspase-1 1, a family of proteases synthesized as an inactive 
proenzyme. Targets of the activated enzymes include: poly(ADP-ribose) polymerase, DNA- 
dependent protein kinase, Ul ribonucleoprotein, nuclear laminins and cytoskeleton 
components (actin). 

- Granzyme B, a serine protease released by cytotoxic T-cells. 
Receptors: 

- CD 95 (synonyms: Fas, APO-1), a receptor protein of the TNF-receptor family 
which includes TNF-R1 and TNF-R2 with the common characteristic of a 70 amino acid 
cytoplasmic domain. 

- FADD (synonym: MORT-1), a cytoplasmic protein 

- DR-3 (synonym: APO-3) a member of the TNF-receptor-family 

- DR-4 and DR-5 
Genes: 
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- ced-3, ced-4 and ced-9 encode the general apoptotic and antiapoptotic program in 
Caenorhabditis elegans. Apaf-3 is the mammalian homologue of ced-3. 



- Bcl-2 / Bcl-xL / Bax / Bcl-xS / Bak: a large gene family that can either inhibit or 
promote apoptosis. 

- Cytokine response modifier A, a cowpox virus gene whose gene product inhibits 
caspases. 

Others: 

- Caspase-activated DNase (CAD) and its inhibitor (ICAD), causes DNA 
fragmentation in the nucleus 

- Ceramide, a complex lipid that acts as a second messenger 

- c-Jun N-terminal kinase (JNK) is a proline-directed kinase 

- p53 protein, is essential for the induction of apoptosis as a response to chromosomal 
damage. 

- RAIDD, a death signal-transducing protein. 

- Receptor interacting protein (RIP) is an accessory protein with a death domain and a 
serine/threonine kinase activity. 

- Sphingomyelinase, an enzyme that hydrolyzes the complex lipid sphingomyelin to 
ceramide. 

- Tumor necrosis factor (TNF) is a type -II membrane protein 

- TNF-receptor associated factor (TRAF2), is an accessory protein that can bind to 
both TNF-R1 and TNF-R2. 

Within the overall group of Differentiation/Development, several categories of 
proteins are coded for by clones of the invention: 

Interleukins (e.g. Interleukin-7V Interleukin precursors related to interleukin-7, for 
example, are expected to act as new growth factors for human B lineage cells. Additionally, 
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these proteins should induce the gene rearrangement of the T-cell receptor repertoire, leading 
to thymocyte commitment, and subsequently induce both cytotoxic T-cell- and lymphocyte- 
activated killer cells These interleukins could find clinical application in a variety of 
conditions of hematolymphopoietic failure and different tumours, because of its recruitment 
of B cell lineage cells, cytotoxic T-cell- and lymphocyte-activated killer cells. (OMIN 
* 146660). Clones in this category include: tes3_35e21. 

Testis-specific Y-encoded proteins : The TSP Y genes are arranged in clusters on the Y 
chromosome of many mammalian species. TSPY is believed to function in early 
spermatogenesis and is a candidate for GBY, the putative gonadoblastoma-inducing gene on 
the Y. Proteins of the TSPY-SET-NAP1L1 family represent proteins closely related to 
TSPY. These proteins seem to be involved in early spermatogenesis. Clones in this category 
include: fbr2_2dl5. 

Intracellular transport and trafficking 

Eukaryotic cells rely for their viability on the partitioning of many basic cellular 
processes into membrane-bounded organelles. These are the nucleus, endoplasmic reticulum 
(ER), Golgi apparatus, endosomes, lysosomal compartments, mitochondria and peroxisomes. 
Most molecules destined for the lysosome, cell surface and outside the cell are routed through 
the ER and Golgi, which together with the vesicular intermediates between them, comprise 
the secretory pathway (Palade 1975). In the ER and Golgi compartments proteins are sorted, 
modified and often assembled into complexes en route to their final destination. Incorrectly 
assembled proteins are retained in the ER until they fold correctly or are targeted for 
degradation. Additional proteins are translocated into and function within the lumenal spaces 
of organelles or are secreted. Thus a large proportion of proteins synthesized require targeting 
to membranes either for insertion into or transport across them. A major purpose of this is 
growth. The secretory pathway is dependent on an intact cytoskeleton and also closely linked 
to general metabolism by affecting ribosome biogenesis (Mizuta and Warner, 1994). A huge 
number of proteins is required for targeting, translocation and sorting of newly synthesized 
proteins. 

The first step in sorting is the recognition of cis-acting targeting or signal sequences 
that organelle-targeted proteins contain. This is carried out by cytosolic targeting factors 
and/or receptors on the membrane to which the protein is targeted. In some cases the primary 



26 



WO 01/12659 PCT7IB00/01496 
sequences are extremely degenerate, with only the overall character being conserved 
(hydrophobicity for an ER signal sequence, helical amphiphilicity for mitochondrial targeting 
sequence (Kaiser et al., 1987; Lemire et aL, 1989). Following the targeting step, proteins are 
either inserted into or transported across the membrane (translocated) through a proteinaceous 
apparatus (termed the translocon). The translocon include or recruit motors to drive the 
translocation process in the correct direction (Schatz and Dobberstein, 1996). 
Defined intracellular protein transport steps: 

•ER 

- targeting to the ER 

- translocation into the lumen of the ER, and, depending on the presence of 
certain signals in the peptide sequence transport through the golgi complex 

• Mitochondria 

- targeting 

- translocation 

• Peroxisomes 

• The general secretory pathway 

- protein modification, assembly and quality control in the ER 

- vesicle-mediated trafficking 

- vesicle docking and fusion 

- transport through the golgi apparatus and sorting at the trans-golgi 

- transport to the cell surface 

- transport routes to the lysosome 

• Endocytosis 

• Specialized protein transport routes 

• Protein export from the cytoplasm 

References: Palade, G (1975) Science 189:347-358; Mizuta et al. (1994) Mol Cell 
Biol 14: 2493-2502; Kaiser et aL (1987) Science 235: 312-317; Lemire et aL (1989) J Biol 
Chem 264: 20206-20215; Schatz et al. (1996) Science 271: 1519-1526. 

Rab proteins 

In eukaryotic cells the compartmentalisation of processes is a prerequisite for a tight 
regulation of processes and activities. The cells contain a highly dynamic set of membrane 
compartments that are responsible for packaging, sorting, secreting, and recycling proteins 
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and other molecules. Trafficking between organelles within the secretory pathway occurs as 
vesicles derived from a donor compartment fuse with specific acceptor membranes, resulting 
in the directional transfer of cargo molecules. This process is tightly controlled by the 
Rab/Ypt family of proteins (reviewed by Novick and Zerial, 1997 ), a branch of the 
superfamily of small GTPases. Rab proteins regulate a variety of functions, including vesicle 
translocation and docking at specific fusion sites. Rabs may also play critical roles in higher 
order processes such as modulating the levels of neurotransmitter release in neurons, a likely 
mechanism in synaptic plasticity that underlies learning and memory (Geppert and Siidhof, 
1998). 

Small GTPases share a common three-dimensional fold that, in the GTP bound state, 
can bind a variety of downstream effector proteins. GTP hydrolysis leads to a conformational 
change in the "switch" regions that renders the GTPase unrecognizable to its effectors. In this 
way, by localizing and activating a select set of effectors, a common structural motif is used 
to control a wide array of distinct cellular processes. 

The final steps in membrane fusion are likely to be driven by a set of proteins known 
as SNAREs. After a vesicle becomes docked, the cytoplasmic domains of VAMP (also 
termed synaptobrevin) and syntaxin on opposing membranes, in combination with a SNAP- 
25 molecule, coalesce into an elongated -helical bundle (Poirier et al., 1998 ; Sutton et al., 
1998 ), which may lead to fusion. Because numerous SNARE isoforms have been identified 
that localize to distinct membrane compartments, it was originally proposed that the 
specificity of interaction between the SNARE proteins accounted for the specificity in 
membrane trafficking. Recent results, however, suggest that SNAREs are not specific in their 
ability to form complexes in vitro, suggesting that trafficking specificity requires additional 
factors (Yang et al., 1999 ). In this regard, Rab proteins are strong candidates for governing 
the specificity of vesicle trafficking. Like the SNAREs, many isoforms (40) of the Rab family 
have been identified that localize to specific membrane compartments (reviewed by Novick 
and Zerial, 1997 ). 

Concomitant with the SNARE cycle, Rab proteins undergo a intricate cycle of 
membrane and protein interactions. Rabs are posttranslationally modified at C-terminal 
cysteines by the addition of two geranylgeranyl groups, which mediate membrane association 
when the Rab is in the GTP-bound state. After guanine nucleotide hydrolysis occurs, the Rab 
is extracted from the membrane upon forming a complex with a cytosolic GDP-dissociation 
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inhibitor (GDI). This cytosolic intermediate is then recycled onto a newly forming vesicle, 
most likely through a secondary factor termed a GDI dissociation factor (GDF), which 
displaces GDL After the Rab becomes membrane bound, a guanidine nucleotide exchange 
factor (GEF) promotes release of GDP and the subsequent loading of GTP. In its GTP-bound 
conformation, the Rab is then free to associate with its specific set of effectors, which can in 
turn trigger events leading to the eventual fusion of the vesicle with a target membrane. To 
complete the cycle, perhaps after or concurrent with membrane fusion, a GTPase activating 
protein (GAP) accelerates nucleotide hydrolysis, switching off the GTPase. The remaining 
GDP-bound Rab can then participate in a new round of fusion. 

Rab interactions with effectors are likely to regulate vesicle targeting and membrane 
fusion in three ways. First, a Rab may specifically facilitate vectorial vesicle transport. 
Vesicles are transported from their site of origin to acceptor compartments likely through 
associations with cytoskeletal elements and transport motors. A protein has been identified 
with a domain structure that suggests a connection between the cytoskeleton and the Rabs. 
This protein, called Rabkinesin-6, contains a kinesin-like ATPase motor domain followed by 
a coiled-coil stalk region and a RBD that specifically binds Rab6 (Echard et al., 1998 ). An 
additional link with the cytoskeleton is provided by the Rab effector, Rabphilin-3A. 
Rabphilin-3A has been shown in vitro to interact with -actinin, an actin-bundling protein, but 
only when not bound to Rab3A (Kato et al., 1996 ). These results raise the intriguing 
possibility that Rab proteins regulate vesicle interactions with the cytoskeleton and thereby 
play an active role in targeting vesicles to their appropriate destinations. 

Second, Rab proteins may regulate membrane trafficking at the vesicle docking step. 
A number of Rab effectors, including Rabaptin-5, EEA1, Rabphilin-3A, and Rim, may serve 
as molecular tethers. Each effector protein contains a RBD, followed by a linker region (some 
having the potential to form elongated coiled-coil structures), and a domain capable of 
interacting with a second Rab or the target membrane. Rabaptin-5, for example, contains two 
RBDs, one near the N terminus that specifically recognizes Rab4 and a second near the C 
terminus that binds Rab5 (Vitale et al., 1998 ). Both Rim, which is localized to the target 
membrane, and Rabphilin-3 A, which is localized to the vesicle, contain N-terminal RBDs and 
C-terminal Ca2+-binding C2 domains, implicating these effectors in synaptic vesicle 
localization or docking in response to Ca2+ influx (Wang et al., 1997 ). Tethering effectors 
may also recognize protein complexes on the acceptor membrane. Sec4p, a yeast Rab3A 
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homolog, interacts with the exocyst (Guo et al., 1999 ), a complex of seven or more subunits 
that is assembled at sites of vesicle fusion along the plasma membrane. The exocyst complex 
may therefore function as a landmark for Rab/effector-mediated vesicle docking. 

Third, once a vesicle has become tethered to its fusion site, Rab proteins may 
selectively activate the SNARE fusion machinery. The mechanism of this activation is 
unknown but may involve direct interactions of Rabs or, more likely, their effectors with 
SNAREs. For example, Hrs-2 is a protein that binds to SNAP-25 and contains a Zn2+-finger 
motif characteristic of Rab-binding proteins such as Rabphilin-3A, Rim, EEA1, and Noc2, 
suggesting that Hrs-2 may form a physical link between Rabs and SNAREs (Bean et al., 
1997). In addition, certain mutations in the syntaxin-binding protein Slylp, the Seclp 
homolog utilized in ER to Golgi trafficking, eliminate the requirement for Yptlp, a Rab 
protein that functions at this trafficking step (Dascher et al., 1991 ). Rabs may therefore 
regulate SNARE associations through Seel family members. In support of this idea, a Rab 
effector was recently found to interact with a vacuole Rab, a Seclp homolog, and a SNARE 
protein (Peterson et al., 1999 ), which suggests that this effector serves to connect Rab and 
SNARE function. In this way, Rabs and their effectors may facilitate the correct pairing of 
SNAREs. 

References: Dascher et al. (1991) Mol. Cell. Biol. 11, 872-885; Echard et al. (1998). 
Science. 279, 580-585; Geppert et al. (1998) Annu. Rev. Neurosci. 21, 75-95; Guo et al. 
(1999). EMBO J. 18, 1071-1080; Kato et al. (1996) J. Biol. Chem. 271, 31775-31778; 
Novick et al. (1997) Curr. Opin. Cell Biol. 9, 496-504; Peterson (1999) Curr. Biol. 9, 159- 
162; Poirier et al. (1998) Nat. Struct. Biol. 5, 765-769; Vitale et al. (1998) EMBO J. 17, 
1941-1951; Wang et al. (1997) Nature. 388, 593-598; Yang et al. (1999) J. Biol. Chem. 274, 
5649-5653. 

Within the overall group of Intracellular Transport and Trafficking several categories 
of proteins are coded for by clones of the invention. 
Rab proteins : 

Rab IB is essential for the intracellular transport of nascent low density lipoprotein 
(LDL) receptor. It is discussed as a universal mediator of endoplasmatic reticulum to Golgi 
transport of membrane glycoproteins in mammalian cells. . Clones in this category include: 
fbr2_2il7, fbr2_3bl6. 
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RablO appear concentrated on membranes in the perinuclear region. Rab 10 has been 
associated (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with the 
following diseases as reported by OMIN: 1) Choroideremia (OMIN *303199); and 2)RETT 
Syndrome (OMIN 312750). Clones in this category include: fbr2_62119. 

In mice, Rab 17 shows epithelial cell specificity. Rab 17 is discussed as candidate gene 
for the mouse mutations In (leaden), Tw (twirler), and ax (ataxia). Cloned from a brain cDNA 
library, the new putative Rab-protein is expected to be involved in vesicle trafficking within 
neuronal cells. These proteins can find application in modulating the transport of vesicles 
inside neuronal cells, which are essential for development of functional dendritic processes. . . 
Clones in this category include: fbr2_41ml5. 

Ankvrin G : The ankyrin 3 gene encodes a novel ankyrin, which is expressed in 
multiple tissues, with very high expression at the axonal initial segment and nodes of Ranvier 
of neurons in the central and peripheral nervous systems. Ankyrin G shows several tissue- 
specific alternative mRNA processing. The different ankyrin G proteins participate in 
maintenance/targeting of ion channels and cell adhesion molecules to nodes of Ranvier and 
axonal initial segments. Ankyrin G has been associated (as potentially diagnostic, 
therapeutic, causative, and/or related, etc. ..) with Werner disease (OMIN *277700). Clones 
in this category include: fkd2_24p5. 

Zn-T-transporters : The Zn-T-transporters are membrane proteins that facilitates 
sequestration of zinc in endosomal vesicles. In the brain, ZnT-3 mRNA seems to be involved 
in the accumulation of zinc in synaptic vesicles. Zinc (Zn) is an essential element in normal 
development and metabolism. Recent studies show that in Alzheimer's disease, Zn functions 
as a double-edged sword, affording protection against Alzheimer's amyloid beta peptide (the 
major component of senile plaques) at low concentrations and enhancing toxicity at high 
concentrations by accelerated aggregation of the amyloid beta peptide. These proteins can 
find application in modulation of Zinc transport in neuronal cells, thus providing means for a 
modulation of Alzheimer's amyloid beta peptide plaque formation. (OMIN *602878, 
♦602095). Clones in this category include: fbr2_62fl 0. 

Metabolism 

This group includes proteins which are involved in the uptake and consumption of 
nutrients, and enzymes which are part of the biochemical pathways for energy metabolism or 
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which are involved in the supply of building blocks of nucleic acids, proteins (NTPs, dNTPs, 
amino acids) for DNA/RNA and protein synthesis, and fatty acids (membranes), to allow for 
the generation of higher order structures. This group constitutes the most important and 
largest group in prokaryotes and lower eukaryotes. The higher the evolutionary level of an 
organism is, however, the more other protein classes like 'signal transduction', 'cell cycle' 
and 'differentiation and development' increase in importance and number of representatives. 

Proteins involved in the metabolism of energy and compounds (here: other than 
nucleic acids or proteins) are usually the products of house keeping genes, they are often 
constitutively and/or ubiquitously expressed. 

Several categories of proteins are coded for by clones of the invention within the 
overall group of Metabolism: 

NAT1, ARD1 : In yeast, ARD1 and NAT1, are required for the expression of an N- 
terminal protein acetyltransferase 1. NAT1 controls full repression of the silent mating type 
locus HML, spoliation and entry into GO. ARD1 is involved in the assembly of the NAT 1- 
complex. These can find application modulating NAT assembly and action and therefore 
could be important in metabolism of drugs and environmental mutagens.(OMTN * 108345). 
Clones in this category include: fbr2_3g8. 

Apolipoprotein E receptor : In LDL-receptors the class A domains form the binding 
site for LDL and calcium. The acidic residues between the fourth and sixth cysteines are 
important for high-affinity binding of positively charged sequences in LDLR's ligands. These 
proteins can find application in modulation of cholesterol binding and transport by LDL- 
receptors and LDL-binding proteins. In normal individuals, chylomicron remnants and very 
low density lipoprotein (VLDL) remnants are rapidly removed from the circulation by 
receptor-mediated endocytosis in the liver. In familial dysbetalipoproteinemia, or type III 
hyperlipoproteinemia (HLP III), increased plasma cholesterol and triglycerides are the 
consequence of impaired clearance of chylomicron and VLDL remnants because of a defect 
in apolipoprotein E. Accumulation of the remnants can result in xanthomatosis and premature 
coronary and/or peripheral vascular disease. OMIN reports that apolipoprotein has 
associations (as potentially diagnostic, therapeutic, causative, and/or related, etc. . .) with the 
following diseases: 1) Familial hypercholesterolemia (OMIN 143890); 2) Familial combined 
hyperlipidemia (OMIN 144250); and 3) Alzheimer disease. (OMIN #104300). Clones in this 
category include: fbr2_62017. 
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Ubiquitin carboxvl-terminal hydrolases : Ubiquitin carboxyl-terminal hydrolases (EC 
3.1.2.15) (UCH) (deubiquitinating enzymes) are thiol proteases that recognize and hydrolyze 
the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the 
processing of poly-ubiquitin precursors as well as that of ubiquinated proteins. OMIN reports 
that Ubiquitin-specific proteases have associations (as potentially diagnostic, therapeutic, 
causative, and/or related, etc.) with the following diseases: 1) Lung carcinoma (OMIN 
♦603486); 2) x-linked retinal diseases (OMIN *300050); 3) oncogenesis (OMIN *300050);4) 
ovarian cancer (OMIN *300050). Clones in this category include: fbr2_78k24; htes3_27dl. 

Phosphoserine signature (phosphoglucomutases. phosphomannomutaseV These 
proteins take part in the conversion of hexose phosphates. OMIN reports that these proteins 
have associations (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with 
the following disease: Fanconi-Bickel Syndrome (OMIN #227810). Clones in this category 
include: fkd2_24bl5. 

NADH ubiquinone oxidoreductase: NADH:ubiquinone oxidoreductase is the first 
enzyme in the respiratory electron transport chain of mitochondria. It is a a membrane-bound 
multi-subunit protein. The bovine heart enzyme contains about 40 different polypeptides. 
OMIN reports that these proteins have associations (as potentially diagnostic, therapeutic, 
causative, and/or related, etc. ..) with the following disease: Brancio-oto-renal syndrome 
(OMIN *6601445). Clones in this category include: fkd2_3ol7. 

Transketolases : Transketolase requires thiamin pyrophosphate as cofactor and shows 
a wide specificity for both reactants, e.g. converts hydroxypyruvate and R-CHO into CO(2) 
and R-CHOH-CO-CH(2)OH. OMIN reports that these proteins have associations (as 
potentially diagnostic, therapeutic, causative, and/or related, etc.) with the following 
diseases: Wernicke-Korsakoff Syndrome (OMIN *277730). Clones in this category include: 
tes3_17117. 

Fatty acid-CoA svnthetases/ligases : These proteins contain AMP-binding domain 
signature(s), which is present in enzymes which act via an ATP-dependent covalent binding 
of AMP to their substrate. This domain is found in several CoA synthetases, such as acetate- 
CoA ligase (EC 6.2.1.1), long-chain-fatty-acid-CoA ligase (EC 6.2.1.3), bile acid-CoA ligase. 
OMIN reports that these proteins have associations (as potentially diagnostic, therapeutic, 
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causative, and/or related, etc. . .) with the following diseases: 1) Alport syndrome , mental 
retardation and elliptocytosis (OMIN *300157); 2) Adrenoleukodystrophy (OMIN *300100). 
Clones in this category include: tes3_35k!7. 

ADP/ATP or Adenine Nucleotide Translocataors : These proteins contain 
mitochondrial energy transfer signature(s) and are most abundant in mitochondria. In its 
functional state, it is a homodimer of 30-kD subunits embedded asymmetrically in the inner 
mitochondrial membrane. The dimer forms a gated pore through which ADP is moved from 
the matrix into the cytoplasm.. OMIN reports that these proteins have associations (as 
potentially diagnostic, therapeutic, causative, and/or related, etc..) with the following 
diseases: 1) cardiomyopathy (OMIN *103220); 2) myopathy (OMIN *103220); 
3)Progressive external ophthalmoplegia (OMIN *601227). Clones in this category include: 
tes3_35nl2. 

Carboxvlesterases : OMIN reports that these proteins have associations (as potentially 
diagnostic, therapeutic, causative, and/or related, etc.. .) with the following diseases: 
l)hepatic carboxylesterase with detoxification of foreign compounds (OMIN *1 14835); 2) 
non-Hodgkin lymphoma (OMIN *1 14835); 3) B-cell chronic lymphocytic leukemia (OMIN 
* 1 14835); 4) rheumatoid arthritis (OMIN * 1 14835). Clones in this category include: 
tes3_35n9. 

Heat shock proteins: OMIN reports that these proteins have associations (as 
potentially diagnostic, therapeutic, causative, and/or related, etc. . .) with the following 
diseases: 1)27 kd heat shock protein has been correlated with thermotolerance in response to 
environmental challenges and developmental transitions. (OMIN * 602 1295). Clones in this 
category include: utell_23el3. 

Nucleic acid management 

The genetic information is stored in the form of nucleic acids in all organisms. Two 
kinds of nucleic acids exist, DNA and RNA. Whereas the more stable DNA in most 
organisms constitutes the storage form of the genetic information, the labile RNA and in 
particular mRNA is an intermediate used for the temporal expression of specific genes. 

In eukaryotes, DNA is usually a double stranded linear molecule consisting of two 
antiparallel strands and made up of a deoxyribose, a phosphorus backbone and the four bases 
A, C, G, and T. The DNA of some organisms has a ring structure. The structure of DNA was 
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unraveled years ago by Watson and Crick. DNA is directional molecule determined by the C- 
atoms of the sugar. 

The most important processes dealing with nucleic acids are: 

• replication (e.g. DNA polymerases, Telomerase) 

• transcription (RNA polymerases) 

• RNA processing (maturation - splicing and degradation) 

• in addition, enzymes and proteins exist which require a nucleic acid (mostly RNA) in the 
active center to be functional (ribozymes - e.g. RNase, Ribosomal proteins) 

The DNA of a cell is replicated in the S-phase of the cell cycle. Several enzymes carry 
out the task of doubling this nucleic acid. As all steps of the cell cycle, also the process of 
replication is tightly regulated. The enzyme DNA polymerase and several other proteins are 
involved in this process. Whereas many prokaryotes do have only one origin of replication 
(i.e., the starting point of the replication cycle), in eukaryotic DNAs (chromosomes) multiple 
such start points exist. The switch from the synthesis (S) phase to the subsequent G2 or M 
phases of the cell cycle are dependent on the completion of the replication. This makes clear, 
that a number of proteins are involved in the replication itself as well as in the control of the 
process. Since most eukaryotic chromosomes are linear structures, additional proteins and 
enzymes are necessary to make sure that the structure is maintained through successive 
generations. This includes those proteins necessary to build the three dimensional structure of 
chromosomes (e.g. histones) and the structural network of the nucleus and nucleolus 
(including the defined localization of transcriptionally active genes in the vicinity of nucleoli) 
but also such enzymes as telomerase which guarantees the integrity of the chromosomal ends. 

The expression of genes is usually performed in two steps. First a messenger RNA 
(mRNA) is produced (transcribed) in one to many copies and second this mRNA is translated 
into the protein product. The regulation of transcription is discussed under the separate 
heading 'transcription factors', but also the classes 'signal transduction', 'development', 'cell 
cycle' and others are affected as the expression of certain genes determines the fate of a cell 
or organism. 

The primary transcript (hnRNA - heterogeneous nuclear RNA) is a single stranded 
one-to-one copy of the gene as it is located on the chromosome. Before a protein can be 
translated, already during transcription the process of maturation is initiated. Firstly, a 5' cap 
structure is enzymatically and covalently added to the RNA, blocking the 5' end of the RNA. 
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Second, when the RNA polymerase has terminated polymerization, the enzyme poly A 
polymerase adds varying numbers of adenine residues to the 3' end of the transcript. This 
enzyme recognizes the sequence AAUAAA or AUUAAA (+ some minor variations), cuts the 
RNA 10-30 nucleotides downstream and adds the A residues. The size of the poly A 
sequence affects the stability of the RNA. Finally, in the process of splicing, the introns 
present on the genomic level and also present in the hnRNA are spliced out by a multi-protein 
complex consisting of several proteins and RNAs. The finally maturated mRNA is exported 
to the cytoplasm where it is translated with help of the ribozymes. 

The half life of RNA is usually much shorter than that of DNA. Usually, the mRNA is 
degraded shortly after synthesis, to guarantee a very defined window of expression of a given 
gene. This regulation is necessary to specifically maintain or change the set of proteins 
present at any time in a cell. Specific regions in the 3'UTR (untranslated region) determine 
the stability of the mRNA in the cytoplasm before it is degraded by RNases, enzymes 
consisting both of protein and RNA. 

References: Watson and Crick (1953) Nature 171: 737-738. 

Several categories of proteins are coded for by clones of the invention within the 
overall group of "Nucleic acid managemenf'and include, among others, the following: 

RNA helicases including DEAD/H box helicases : RNA helicases comprise a large 
family of proteins that are involved in basic biological systems such as nuclear and 
mitochondrial splicing processes, RNA editing, rRNA processing, translation initiation, 
nuclear mRNA export, and mRNA degradation. RNA helicases are essential factors in cell 
development and differentiation, and some of them play a role in transcription and replication 
of viral single-stranded RNA genomes. The members of the largest subgroup, the DEAD and 
DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. DEAD box proteins have been associated (as potentially diagnostic, therapeutic, 
causative, and/or related, etc.) as reported by with the following disease processes and/or 
genes: 1) ataxia-telangiectasia gene: "A human gene (DDX10) encoding a putative DEAD- 
box RNA helicase at 1 Iq22-q23" Genomics 33:199-206, 1996, Savitsky et al., (OMIN 
*601235); 2) hematopoetic tumors: "Cloning and expression of a murine cDNA homologous 
to the human RCK/P54, a lymphoma-linked chromosomal breakpoint 1 lq23", Gene 166:293- 
6, 1995, Seto et al. (OMIN * 600326); 3) dermatomyositis: a) "The major dermatomyositis- 
specific Mi-2 autoantigen is a presumed helicase involved in transcriptional activation." 
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Arthritis Rheum. 38: 1389-1399, 1995, Seelig et al. (OMIN *603277); b) "Two forms of the 
major antigenic protein of the dermatomyositis-specific Mi-2 autoantigen." (Letter), Arthritis 
Rheum. 39: 1769-1771, 1996., Seelig et al. (OMIN *603277); c) "The dermatomyositis- 
specific autoantigen Mi2 is a component of a complex containing histone deacetylase and 
nucleosome remodeling activities", Cell 95: 279-289, 1998. Zhang et al (OMIN *603277); 4) 
Muscular Dystrophy, Pseudohypertrophic Progressive Duchenne and Becker Types (OMIN 
*3 10200); 5) Mucopolysaccharidosis Type IVA (OMIN *253000); 6) Albinism I (OMIN 
*203100); 7) Wilms Tumor 1 (OMIN *194070); 8) Spinocerebellar Ataxia 7 (OMIN 
♦164500). Clones in this category include: fbr2_23bl0, fbr2_3cl8, fbr2_6ol7, fbr2_82i24, 
and tes3_14h21. 

Inorganic pyrophosphatase : Inorganic pyrophosphatase (EC 3.6.1.1) (PPase) is the 
enzyme responsible for the hydrolysis of pyrophosphate (PPi) which is formed as the product 
of the many biosynthetic reactions that utilize ATP. All known PPases require the presence of 
divalent metal cations, with magnesium conferring the highest activity. Clones in this 
category include: fbr2_64a 15. 

DNA-damage -inducible protein fdinP) or Proteins induced by DNA-Damape : The 
dinB/P pathway is a second SOS-pathway in E.coli. Genes related to this seem to be 
involved in modulating DNA repair and mutagenesis. Clones in this category include: 
fbr2_72bl8. 

Proteins with mvc-tvpe. helix-loop-helix dimerization domain signature(s) . This 
helix-loop-helix domain mediates protein dimerization has been found in proteins such as the 
myc family of cellular oncogenes, proteins involved in myogenesis and vertebrate proteins 
that bind specific DNA sequences in various immunoglobulin chains enhancers. Therefore, 
these proteins could be novel DNA-binding proteins. Clones in this category include: 
fbr2_72112. 

Cvtosolic ribosomal proteins L36 : L36 seems to be part of the eukaryotic ribosomal 
peptidyl transferase center and can find application in modulation of ribosome assembly, 
maintenance and activity. Clones in this category include: fkd2_3b2. 

Ribonuclease H : Ribonuclease H proteins are RNA modificating proteins and have 
been associated (as potentially diagnostic, therapeutic, causative, and/or related, etc. . .) with 
the following diseases as reported by OMIN: 1) Adenomatous Polyposis of the Colon (OMIN 
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♦175100); 2) Retinoblastoma (OMIN * 1 80200) ; and 3) Von Hippel-Lindau Syndrome 
(OMIN * 193300). Clones in this category include: phtes3_15j3. 



Signal transduction 

Cells in higher order organisms need to continuously communicate with its 
environment especially with other cells of the same organism in order to maintain the 
function and specialization of the whole system these cells are part of. This important task of 
communication is performed with help of cell-surface receptors which receive and transmit 
signals from outside into the cell. 
G-proteins 

The largest known family of cell-surface receptors is that of the G-protein-coupled receptors, 
which mediate the transmission of diverse stimuli such as neurotransmitters, glycopeptides, 
hormones, peptides, odorant molecules, and photons. The functional unit of these receptors is 
composed of the receptor molecule itself (GPCR) which is anchored in the cytoplasma 
membrane with seven membrane spanning domains, the heterotrimeric G-protein which is 
composed of a and Py-subunits (Get and GPy), and the effectors that interact with Ga and / or 
GPy. In particular, the dissociated Ga and GPy can regulate the activities of a number of 
effector molecules such as adenylate cyclases, phopholipase C isoforms, ion channels, and 
tyrosine kinases, resulting in a variety of cellular functions. The process of signal 
transduction must be tightly regulated and reversible in order to avoid overstimulation, to 
achieve signal termination, and render the receptor responsive to subsequent stimuli 
[Iacovelly L. et al., (1999) FASEBJ. 13, 1-8, Hamm, H.R (1998) J. Biol. Chem. 273, 669- 
672]. 

G-proteins are GTPases that, upon binding of GTP change their conformation which 
in return unmasks structural motives, in particular the so called effector loop, which can 
mediate the interactions to target proteins, or effectors, for the GTPases. This ability enables 
the GTPases to cycle between active, GTP-bound and inactive, GDP bound conformations 
and in the process to function as molecular traffic lights in a multitude of signal transduction 
pathways. The most important of these signal transduction pathways that are regulated with 
help of G-proteins are that of the phospholipase C / protein kinase C and that of the adenylate 
cyclase / protein kinase A. 
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The cycling of GTPases is tightly regulated by three main classes of proteins: The 
exchange of hydrolyzed GDP for a fresh GTP is facilitated by guanosine nucleotide exchange 
factors (GEFs), the hydrolysis of GTP to GDP is sped up by GTPase-activating proteins 
(GAPs), and the dissociation of GDP from the GTPases is inhibited by GDP dissociation 
inhibitors (GDIs) [Tapon and Hall (1997) Curr.Opin. Cell Biol 9, 86-92, Van Aelst and D- 
Souza-Schorey (1997) Genes Dev. 11, 2295-2322]. 

SOC-familv 

A conserved motif that was originally identified in proteins that negatively regulate 
the signaling action of cytokines was termed SOCS box, the Suppressor Of Cytokine 
Signaling. Based on homology, five distinct structural protein classes have been identified 
since that carry this motif. The function of most of these proteins is presently not known. 
Common to the proteins is only the SOCS box which is located near the C-terminus of the 
respective peptides. Recently, the SOCS box has been demonstrated to induce binding of 
proteins to elongins B and C which could target the proteins (and bound substrates) to the 
proteasomal protein degradation pathway (Kamura, T. et al (1998) Genes Dev. 12, 3872- 
3881; Zhang, J.-G. et al (1999) Proc. Natl Acad ScL USA 96, 2071-2076). 

The class where the SOCS box was originally described contains several members 
(SOCS-l-SOCS-7 and CIS). In addition to the SOCS box, these proteins also contain a SH2 
(Src-homology 2) domain and a variable N-terminus. These SOCS proteins appear to form 
part of a classical negative feedback loop that regulates cytokine signal transduction. Upon 
cytokine stimulation, expression of SOCS proteins is rapidly induced and the proteins inhibit 
further cytokine action. The mode of action of the SOCS proteins is variable. While SOCS-1 
binds and inhibits the JAK (Janus kinases) family of cytoplasmic protein kinases [Narahzaki 
M. et al (1998) Proc. Natl Acad. ScL USA 95, 13130-13134, Nicholson, S.E. et al (1999) 
EMBO. */. 18, 375-385], CIS appears to act by competing with signaling molecules such as 
the STATs (Transducers and Activators of Transcription) family for binding to 
phosphorylated receptor cytoplasmic domains [Yoshimura, A. et al (1995) EMBO J. 14, 
2816-2826; Matsumoto, A. etal (1997) BloodS9 9 3148-3154], 

A second class of SOCS box protein contains additionally WD-40 repeats which were 
initially identified in the mouse WSB-1 and -2 proteins. The functions of WD-40 proteins are 
not completely understood but seem to be rather divergent. In Cdc4p the WD-40 repeats 
probably are necessary for binding the substrate for Cdc34p [Mathias, N. et al (1999) Mol 
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Cell Biol. 19, 1759-1 767]. Cdc4p is a component of a ubiquitin ligase that tethers the 
ubiquitin-conjugating enzyme Cdc34p to its substrates. The posttranslational modification of 
a protein by ubiquitin usually results in rapid degradation of the ubiquitinated protein by the 
proteasome. The transfer of ubiquitin to substrate is a multistep process where WD-40 repeats 
might play an important function. 

Other WD-40 containing proteins (e.g. the retino blastoma binding protein RbAp48) 
have been shown to bind metal ions (Zinc) and that this metal binding might mediate and/or 
regulate protein-protein interactions which are functionally important in chromatin 
metabolism [Kenzior, A.L. and Folk, W.R. (1998) FEBS Lett. 440, 425-429]. These proteins 
are involved in the RAS-cAMP pathway that regulates cellular growth [Ach R.A. et al 
(1997) Plant Cell 9, 1595-1606]. 

The SPRY domain has been identified in pyrin or marenostrin, a protein which is 
mutated in patients with Mediterranean fever and which is similar to the butyrophilin family. 
While butyrophilins seem to be involved in the lactation process in mammals, the function 
pyrin is unknown. Three proteins (SSB-1 to -3) have been identified to contain both SPRY 
and SOCS box motifs. The function of these proteins is also not known. 

Ankyrin repeat containing proteins share a 33-residue repeating motif, an L-shaped 
structure with protruding (3-hairpin tips which mediate specific macromolecular interactions 
with cytoskeletal, membrane, and regulatory proteins. These proteins play fundamental roles 
in diverse biological activities including growth and development, intracellular protein 
trafficking, the establishment and maintenance of cellular polarity, cell adhesion signal 
transduction, and mRNA transcription. Three proteins that contain ankyrin repeats (ASB-1 to 
-3) have been identified to contain a C-terminal SOCS box additionally to the ankyrin 
repeats. The function of these proteins or the individual domains remains to be discovered 
[Hilton, DJ. et al (1998) Proc. Natl. Acad ScL USA 95, 1 14-1 19]. 

A few small GTPases (RAR and RAR like) do also contain a SOCS box. GTPases are 
involved in signal transduction during cellular communication. The function of the SOCS box 
in this type of proteins is currently unclear [Hilton, DJ. et al (1998) Proc. Natl Acad. ScL 
USA 95, 114-119]. 

Ca 2+ as second messenger 

The bivalent cation Ca 2+ is, besides cAMP, one of the two major second messengers 

in eukaryotic cells. Its intracellular concentration is tightly regulated and usually kept very 
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low compared to the cell's environment. Ca 2+ binding proteins and transporters (Gap junction, 
Voltage-gated, second messenger-gated) help to sequester huge amounts of the ion in various 
organelles from where Ca 2+ can be released upon extracellular stimuli. E.g. the contraction of 
the muscle is dependent on the presence of Ca 2+ ions which are readily transported back into 
the organelles in order for the muscle to relax. In signal transduction, Ca 2+ functions as a 
second messenger that activates Ca 2+ dependent processes through the activation of 
Ca 2 7calmodulin dependent protein kinases (CaM kinases) which are the major effector 
molecules of Ca 2+ . In the signaling cascades, the CaM dependent kinases activate 
phospholipases (e.g. phospholipase C) that in return activate other protein kinases such as 
protein kinase C. 

cAMP 

The cyclic AMP is produced by the enzyme adenylate cyclase in response to 
extracellular signals. Certain G-proteins stimulate the activity of adenylate cyclase which 
converts ATP to cAMP and PPi. Two molecules of cAMP bind to each of two regulatory 
subunits of cAMP dependent protein kinase which in turn dissociate from the two catalytic 
subunits of the heterotetramer R 2 C 2 . Upon release of the C-subunits, they become active and 
phosphorylate substrate proteins at Ser and Thr residues. The process leading from binding of 
extracellular molecules to their receptors, the transmission of the stimuli into the cell, the 
activation of adenylate cyclase and the subsequent activation of cAMP dependent protein 
kinase is one of two major signal transduction pathways in eukaryotic cells. Since the 
phosphorylation of proteins is a posttranslational modification of proteins, the kinases are 
described in the class "signal transduction." 

SARA 

Members of the transforming growth factor R (TGFfi) superfamily signal through a 
family of cell-surface transmembrane serine/threonine kinases, known as type I and type II 
receptors (Heldin et al., 1997 ; Attisano and Wrana, 1998 ; Kretzschmar and MassaguS, 
1998). Ligand induces formation of heteromeric complexes of these receptors, and signaling 
is initiated when receptor I is phosphorylated and activated by the constitutively active kinase 
of receptor II (Wrana et al., 1994 ). The activated type I receptor kinase then propagates the 
signal to a family of intracellular signaling mediators known as Smads (contraction of the 
C.elegans Sma and Drosophila Mad genes which were the first identified members of this 
class of signaling effectors). 
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Three classes of Smads with distinct functions have been defined: the receptor- 
regulated Smads, which include Smadl, 2, 3, 5, and 8; the common mediator Smad, Smad4; 
and the antagonistic Smads, which include Smad6 and 7 (Heldin et al., 1997; Attisano and 
Wrana, 1998 ; Kretzschmar and Massagu6, 1998 ). Receptor-regulated Smads (R-Smads) act 
as direct substrates of specific type I receptors, and the proteins are phosphorylated on the last 
two serines at the carboxyl terminus within a highly conserved SSXS motif (Macias-Silva et 
al., 1996 ; Abdollah et al., 1997 ; Kretzschmar et al., 1997 ; Liu et al., 1997b ; Souchelnytskyi 
et al., 1997 ). Regulation of R-Smads by the receptor kinase provides an important level of 
specificity in this system. Thus, Smad2 and Smad3 are substrates of TGFB or activin 
receptors and mediate signaling by these ligands (Macias-Silva et al., 1996 ; Liu et al., 1997b 
; Nakao et al., 1997 ), whereas Smadl, 5, and 8 are targets of BMP receptors and propagate 
BMP signals (Hoodless et al., 1996 ; Chen et al., 1997b ; Kretzschmar et al., 1997 ; 
Nishimura et al., 1998 ). Once phosphorylated, R-Smads associate with the common Smad, 
Smad4 (Lagna et al., 1996 ; Zhang et al., 1997 ), and mediate nuclear translocation of the 
heteromeric complex. In the nucleus, Smad complexes then activate specific genes through 
cooperative interactions with DNA and other DNA-binding proteins such as FASTI, FAST2, 
and Fos/Jun (Chen et al., 1996 , Chen et al., 1997a ; Liu et al., 1997a ; Labbe et al., 1998 ; 
Zhang et al., 1998 ; Zhou et al., 1998 ). In contrast to R-Smads and Smad4, the antagonistic 
Smads, Smad6 and 7, appear to function by blocking ligand-dependent signaling (reviewed in 
Heldin et al., 1997 ). 

Phosphorylation of R-Smads by the type I receptor is essential for activating the 
TGFB signaling pathway (Heldin et al., 1997 ; Attisano and Wrana, 1998 ; Kretzschmar and 
Massague, 1998 ). However, little is known of how Smad interaction with receptors is 
controlled. A novel Smad2/Smad3 interacting protein has been described (Tsukazaki T. et al., 
1998 ) that contains a double zinc finger, or FYVE domain, and which has been called SARA 
(Smad anchor for receptor activation). The SARA motif recruits Smad2 into distinct 
subcellular domains and co-localizes and interacts with TGFB receptors. TGFB signaling 
induces dissociation of Smad2 from SARA with concomitant formation of Smad2/Smad4 
complexes and nuclear translocation. Moreover, deletion of the FYVE domain in SARA 
causes mislocalization of Smad2 and inhibits TGFB-dependent transcriptional responses. 
Thus, SARA defines a component of TGFB signaling that functions to recruit Smad2 to the 
receptor by controlling the subcellular localization of Smad. 
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Calcium 

The bivalent cation Ca 2+ is, along with cAMP, one of the two major second 
messengers in eukaryotic cells. Its intracellular concentration is tightly regulated and usually 
kept very low compared to the cell's environment. Ca 2+ binding proteins and transporters 
(Gap junction, Voltage-gated, second messenger-gated) help to sequester huge amounts of the 
ion in various organelles from where Ca 2+ can be released upon extracellular stimuli. E.g. the 
contraction of the muscle is dependent on the presence of Ca 2+ ions which are readily 
transported back into the organelles in order for the muscle to relax. In signal transduction, 
Ca 2+ functions as a second messenger that activates Ca 2+ dependent processes through the 
activation of Ca 2 7calmodulin dependent protein kinases (CaM kinases) which are the major 
effector molecules of Ca 2+ . In the signaling cascades, the CaM dependent kinases activate 
phospholipases (e.g. phospholipase C) that in return activate other protein kinases such as 
protein kinase C. 

Rab proteins 

In eukaryotic cells the compartmentalization of processes is a prerequisite for a tight 
regulation of processes and activities. The cells contain a highly dynamic set of membrane 
compartments that are responsible for packaging, sorting, secreting, and recycling proteins 
and other molecules. Trafficking between organelles within the secretory pathway occurs as 
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vesicles derived from a donor compartment fuse with specific acceptor membranes, resulting 
in the directional transfer of cargo molecules. This process is tightly controlled by the 
Rab/Ypt family of proteins (reviewed by Novick and Zerial, 1997 ), a branch of the 
superfamily of small GTPases. Rab proteins regulate a variety of functions, including vesicle 
translocation and docking at specific fusion sites. Rabs may also play critical roles in higher 
order processes such as modulating the levels of neurotransmitter release in neurons, a likely 
mechanism in synaptic plasticity that underlies learning and memory (Geppert and Siidhof, 
1998 ). 

Small GTPases share a common three-dimensional fold that, in the GTP bound state, 
can bind a variety of downstream effector proteins. GTP hydrolysis leads to a conformational 
change in the "switch" regions that renders the GTPase unrecognizable to its effectors. In this 
way, by localizing and activating a select set of effectors, a common structural motif is used 
to control a wide array of distinct cellular processes. 

The final steps in membrane fusion are likely to be driven by a set of proteins known 
as SNAREs. After a vesicle becomes docked, the cytoplasmic domains of VAMP (also 
termed synaptobrevin) and syntaxin on opposing membranes, in combination with a SNAP- 
25 molecule, coalesce into an elongated -helical bundle (Poirier et al., 1998 ; Sutton et al., 
1998 ), which may lead to fusion. Because numerous SNARE isoforms have been identified 
that localize to distinct membrane compartments, it was originally proposed that the 
specificity of interaction between the SNARE proteins accounted for the specificity in 
membrane trafficking. Recent results, however, suggest that SNAREs are not specific in their 
ability to form complexes in vitro, suggesting that trafficking specificity requires additional 
factors (Yang et al., 1999 ). In this regard, Rab proteins are strong candidates for governing 
the specificity of vesicle trafficking. Like the SNAREs, many isoforms (40) of the Rab family 
have been identified that localize to specific membrane compartments (reviewed by Novick 
and Zerial, 1997 ). 

Concomitant with the SNARE cycle, Rab proteins undergo a intricate cycle of 
membrane and protein interactions. Rabs are posttranslationally modified at C-terminal 
cysteines by the addition of two geranylgeranyl groups, which mediate membrane association 
when the Rab is in the GTP-bound state. After guanine nucleotide hydrolysis occurs, the Rab 
is extracted from the membrane upon forming a complex with a cytosolic GDP-dissociation 
inhibitor (GDI). This cytosolic intermediate is then recycled onto a newly forming vesicle, 



44 



WO 01/12659 PCT/IB00/01496 
most likely through a secondary factor termed a GDI dissociation factor (GDF), which 
displaces GDI. After the Rab becomes membrane bound, a guanidine nucleotide exchange 
factor (GEF) promotes release of GDP and the subsequent loading of GTP. In its GTP-bound 
conformation, the Rab is then free to associate with its specific set of effectors, which can in 
turn trigger events leading to the eventual fusion of the vesicle with a target membrane. To 
complete the cycle, perhaps after or concurrent with membrane fusion, a GTPase activating 
protein (GAP) accelerates nucleotide hydrolysis, switching off the GTPase. The remaining 
GDP-bound Rab can then participate in a new round of fusion. 

Rab interactions with effectors are likely to regulate vesicle targeting and membrane 
fusion in three ways. First, a Rab may specifically facilitate vectorial vesicle transport. 
Vesicles are transported from their site of origin to acceptor compartments likely through 
associations with cytoskeletal elements and transport motors. A protein has been identified 
with a domain structure that suggests a connection between the cytoskeleton and the Rabs. 
This protein, called Rabkinesin-6, contains a kinesin-like ATPase motor domain followed by 
a coiled-coil stalk region and a RBD that specifically binds Rab6 (Echard et al., 1998 ). An 
additional link with the cytoskeleton is provided by the Rab effector, Rabphilin-3A. 
Rabphilin-3 A has been shown in vitro to interact with -actinin, an actin-bundling protein, but 
only when not bound to Rab3A (Kato et al., 1996 ). These results raise the intriguing 
possibility that Rab proteins regulate vesicle interactions with the cytoskeleton and thereby 
play an active role in targeting vesicles to their appropriate destinations. 

Second, Rab proteins may regulate membrane trafficking at the vesicle docking step. 
A number of Rab effectors, including Rabaptin-5, EEA1, Rabphilin-3 A, and Rim, may serve 
as molecular tethers. Each effector protein contains a RBD, followed by a linker region (some 
having the potential to form elongated coiled-coil structures), and a domain capable of 
interacting with a second Rab or the target membrane. Rabaptin-5, for example, contains two 
RBDs, one near the N terminus that specifically recognizes Rab4 and a second near the C 
terminus that binds Rab5 (Vitale et al., 1998 ). Both Rim, which is localized to the target 
membrane, and Rabphilin-3A, which is localized to the vesicle, contain N-terminal RBDs and 
C-terminal Ca2+-binding C2 domains, implicating these effectors in synaptic vesicle 
localization or docking in response to Ca2+ influx (Wang et al., 1997 ). Tethering effectors 
may also recognize protein complexes on the acceptor membrane. Sec4p, a yeast Rab3A 
homolog, interacts with the exocyst (Guo et al., 1999 ), a complex of seven or more subunits 
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that is assembled at sites of vesicle fusion along the plasma membrane. The exocyst complex 
may therefore function as a landmark for Rab/effector-mediated vesicle docking. 

Third, once a vesicle has become tethered to its fusion site, Rab proteins may 
selectively activate the SNARE fusion machinery. The mechanism of this activation is 
unknown but may involve direct interactions of Rabs or, more likely, their effectors with 
SNAREs. For example, Hrs-2 is a protein that binds to SNAP-25 and contains a Zn2+-finger 
motif characteristic of Rab-binding proteins such as Rabphilin-3A, Rim, EEA1, and Noc2, 
suggesting that Hrs-2 may form a physical link between Rabs and SNAREs (Bean et al., 
1997). In addition, certain mutations in the syntaxin-binding protein Slylp, the Seclp 
homolog utilized in ER to Golgi trafficking, eliminate the requirement for Yptlp, a Rab 
protein that functions at this trafficking step (Dascher et al., 1991 ). Rabs may therefore 
regulate SNARE associations through Seel family members. In support of this idea, a Rab 
effector was recently found to interact with a vacuole Rab, a Seclp homolog, and a SNARE 
protein (Peterson et al., 1999 ), which suggests that this effector serves to connect Rab and 
SNARE function. In this way, Rabs and their effectors may facilitate the correct pairing of 
SNAREs. 

References: Dascher et al. (1991). Mol. Cell. Biol. 11, 872-885; Echard et al. (1998). 
Science. 279, 580-585; Geppert et al. (1998). Annu. Rev. Neurosci. 21, 75-95; Guoet al. 
(1999). EMBO J. 18, 1071-1080; Kato et al. (1996). J. Biol. Chem, 271, 31775-31778; 
Novick et al. (1997). Cunr. Opin. Cell Biol. 9, 496-504; Peterson et al. (1999). Curr. Biol. 9, 
159-162; Poirier et al. (1998). Nat Struct. Biol. 5, 765-769; Vitale et al. (1998). EMBO J. 17, 
1941-1951; Wang et al. (1997). Nature. 388, 593-598; Yang et al, (1999). J. Biol. Chem. 274, 
5649-5653. 

Kinases 

Reversible posttranslational modifications of proteins are major means of regulating 
cellular activities. Among the various modifications that are carried out by the cells, the 
addition of phosphoryl groups to Ser/Thr or Tyr residues is the most important and widely 
used. The phosphorylation of proteins is accomplished by protein kinases, while the reverse 
reaction, the removal of phosphoryl groups, is carried out by phosphatases. Kinases / 
Phosphatases regulate key positions e.g. in the processes of cell proliferation, differentiation 
and communication/signaling. These processes must be tightly regulated in order to maintain 
a steady state level of cellular fate. Mis-regulation of kinase activities (or that of 
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phosphatases) is made responsible for a multitude of disease processes such as oncogenesis, 
inflammatory processes, arteriosclerosis, and psoriasis. 

Protein kinases constitute the largest protein family that is currently known. Several 
hundred kinases have been identified already. Classically, kinases are subdivided into two 
classes based on the amino acid residues in their substrates that are phosphorylated by the 
particular enzymes. The kinases specifically add phosphoryl groups from adenosine 
triphosphate (ATP) or, less frequently, guanosine triphosphate (GTP), either to serine and/or 
threonine or to tyrosine residues of substrate proteins. An estimated 1,000 to 10,000 proteins 
present in a typical mammalian cell are believed to be regulated also by the action of protein 
kinases. 

Protein kinases are frequently integral parts of signaling cascades that transmit 
extracellular stimuli (e.g. hormones, neurotransmitters, growth- or differentiation factors) into 
the cell and result in various responses by the cells. The kinases play key roles in these 
cascades as they constitute a sort of 'molecular switches' turning on or off the activities of 
other enzymes and proteins, e.g. metabolic, regulatory, channels and pumps, receptors, 
cytoskeletal, transcription factors. 

The regulation of kinase activities is accomplished by various means: 

The best characterized example for the regulation via regulatory subunits is the 
cAMP-dependent protein kinase (PKA) which is also a prototype for second messenger 
activated protein kinases. This enzyme consists of a heterotetramer of two catalytic (C) and 
two regulatory (R) subunits. Upon binding of two molecules of second messenger (cAMP) in 
each R subunit, the catalytic subunits are released and active. Both of the catalytic and the 
regulatory subunits several isoforms exist. The combination of catalytic and regulatory 
subunits determines the localization of the holoenzyme and also the substrate spectrum that is 
available for phosphorylation. The consensus pattern necessary to be present in the substrate 
for PKA action is RRXS/T where X can be any amino acid. 

The casein kinase II comprises another examples for holoenzymes that consist of 
catalytic and regulatory subunits. Other kinases that are activated by second messengers are 
cGMP-dependent protein kinase and Protein kinase C (PKC) which is activated by 
diacylglycerol, which in turn is produced by phospholipases by cleavage of 
phosphatidylcholine. 
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Receptor kinases usually consists of an extracellular domain which can bind effector 
molecules (e.g. growth factors and hormones) and transfer the stimulus to the intracellular 
domain of these proteins which usually is a protein tyrosine kinase. Other tyrosine kinases 
lack an extracellular domain but are associated with receptors which transfer the signal after 
effector binding by activating the associated protein kinase enzyme (e.g. Src kinase family; 
Src, Blk, Fgr, Fyn, Lck Lyn, Yes and Janus kinase family; Jakl-3, Tyk2). 

Dysfunction of kinases, e.g. caused by non-functioning regulation, can be the cause of 
inflammatory diseases and uncontrolled proliferation. v-Src which is a truncated version of 
the C-Src protooncogene tyrosine kinase is a classical example for this process as v-Src does 
not contain the regulatory domain of the cellular gene and is thus constitutively active. 

Several categories of proteins are coded for by clones of the invention within the 
overall group of "Signal transduction"and include, among others, the following: 

Neurocalcin (Recoverin) : Neurocalcin is a Ca(2+)-binding protein with three putative 
Ca(2+)-binding domains (EF-hands). In cattle, 6 isoforms are differentially expressed in the 
central nervous system, retina and adrenal gland. Homology with recoverin indicates 
involvement in Ca2+ dependent activation of guanylate cyclase.. These proteins can find 
application in modulating/blocking the guanylate cyclase-pathway. Diseases associated (as 
potentially diagnostic, therapeutic, causative, and/or related, etc..) with these proteins 
include as reported by OMIN 1) autosomal dominant cone dystrophy (OMIN *600364); 2) 
cone dystrophy 3 (OMIN * 600364); 3) cancer associated retinopathy (OMIN +179618). 
Clones in this category include: fbr2_23b21 . 

Proteins with a WW Domain : Proteins that contain a WW domain which has been 
originally described as a short conserved region in a number of unrelated proteins, among 
them dystrophin, the gene responsible for Duchenne muscular dystrophy. The domain, which 
spans about 35 residues, is repeated up to 4 times in some proteins. It has been shown to bind 
proteins with particular proline-motifs, [AP]-P-P-[AP]-Y, and thus resembles somewhat SH3 
domains. This domain is frequently associated with other domains typical for proteins in 
signal transduction processes. Examples of proteins containing the WW domain are 
Dystrophin, Utrophin, vertebrate YAP protein (binds the SH3 domain of the Yes 
oncoprotein), murine NEDD-4 (embryonic development and differentiation of the central 
nervous system), IQGAP (human GTPase activating protein acting on ras). Therefore these 
proteins should be involved in intracellular signal transduction. Diseases associated (as 
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potentially diagnostic, therapeutic, causative, and/or related, etc..) with these proteins 
include as reported by OMIN 1) Muscular Dystrophy, Pseudohypertrophic Progressive 
Duchenne and Becker Types (OMIN *3 10200). Clones in this category include: fbr2_23nl6. 

Protein substrates for cAMP-dependent protein kinase : Acting as a choride channel or 
chloride channel inhibitor these proteins have been associated (as potentially diagnostic, 
therapeutic, causative, and/or related, etc..) as reported by OMIN with Cystic Fibrosis 
(OMIN #219700). Clones in this category include fbr2_82il7. 

Sphingosine kinase : Sphingosine kinase is a new type of lipid kinase, which is 
regulated by growth factors. The enzyme phosphorylates sphingosine, which subsequently 
exerts intracellular and extracellular actions. Intracellular, sphingosine 1 -phosphate (SPP) 
promotes proliferation and inhibits apoptosis. In yeast, survival of cells exposed to heat shock 
indicates is dependent on SPP. Extracellulary, SPP inhibits cell motility and influences cell 
morphology, effects that appear to be mediated by the G protein-coupled receptor EDG1. 
These proteins have been associated (as potentially diagnostic, therapeutic, causative, and/or 
related, etc . .) as reported by OMIN with Gaucher Disease, Type I (OMIN *230800). Clones 
in this category include fbr2_82m6. 

Vanilloid Receptors : VR1 seems to play an important role in the activation and 
sensitization of nociceptors. It is the receptor for e.g. capsaicin, a selective activator of 
nociceptors, a natural product of capsicum peppers. Related can find application as a target 
for the development of new nociception-modulating drugs. Clones in this category include 
tes3_20k2. 

RCC1 (Regulator of chromosome condensation): RCC1 (regulator of chromosome 
condensation) is a eukaryotic protein which binds to chromatin and interacts with ran, a 
nuclear GTP-binding protein. RCC1 promotes the exchange of bound GDP with GTP, acting 
as a guanine-nucleotide dissociation stimulator. These proteins can find application in the 
regulation of gene expression by activition of nuclear GTP-binding proteins. The X-linked 
retinitis pigmentosa is a result of a defect GTPase regulator, which contains a RCCl-type 
repeat. OMIN also reports that RCC1 has associations (as potentially diagnostic, therapeutic, 
causative, and/or related, etc ..) with retinitis pigmentosa (OMIN *312610). Clones in this 
category include tes3_21d4. 

Ras inhibitor proteins : Ras is a signal transducting molecule involved in the receptor 
tyrosine kinase/RAS/Map kinase signalling cascade. Ras proteins bind GDP/GTP and show 
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intrinsic GTPase activity. Mutations in ras, which change aa 12, 13 or 61 activate the 
potential of ras to transform cultured cells and are implicated in a variety of human tumours. 
Ras inhibitor proteins have been associated (as potentially diagnostic, therapeutic, causative, 
and/or related, etc. . .) with many disease processes as reported by OMIN including: 1) 
Tumors of the lung, breast, brain, pituitary, pancrase, bone, skin, bladder, kidney, ovary, 
prostate and lymphocyte, Melanoma (OMIN *600160); 2) X-linked non-specific mental 
retardation (OMIN *300104); 3)adenomatouspolyposis of the colon (OMIN * 175 100); 4) 
Beckwith-Wieddemann Syndrome (#130650); and 5) Major affective disorder 1 (OMIN 
♦125480). Clones in this category include utel_22g21. 

Mammalian proteins cornicon involving the EGF-receptor : Cornicon proteins are part 
of a signal transduction pathway involving the EGF-receptor. The EGF-receptor has been 
reported by OMIN to be associated (as potentially diagnostic, therapeutic, causative, and/or 
related, etc. . .) with the following diseases: 1) Familial hypercholesterolemia (OMIN 
143890); 2) Leprechaunism (OMIN #246200); 3) Hemophilia B (OMIN *306900); 4) 
Ectodermal dysplasia 1; 5) Kartagenerer syndrome (OMIN * 244400) and 6) Glioma of the 
brain (OMIN * 137800). ). Clones in this category include utel_22el2. 

Transmembrane proteins 

Membrane region prediction was effected using the ALOM2 software (Klein et al., 
1985; version. 2 by K. Nakai). Similar to many other methods, the Kyte & Doolitle (1982) 
amino acid hydrophobicity scale is used in ALOM2 as the primary variable for classifying 
sequences in terms of their localization. High prediction accuracy is achieved through the 
system of intelligent decision rules and the utilization of a carefully selected training data set. 
The method also generates reliability estimates which makes it possible to distinguish 
between membrane-spanning proteins (I, intrinsic) and globular proteins with regions of high 
hydrophobicity buried in the core. 

For a protein of length L, the block of length / with maximum hydrophobicity is 

found: 
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where //, represents the hydrophobicity of an individual residue. 



Let P(I/maxH) and P(E/maxH) be the conditional probabilities that a protein is 
integral or peripheral, respectively, given its value of maximal hydrophobicity maxH, and let 
P(I) and P(E) be the prior probabilities of intrinsic and extrinsic membrane proteins estimated 
from the training set. Then a sequence is assigned to E if 

P(E/maxH) > P(I/maxH) 

or, after applying the Bayes rule, 

P(E)P(maxH/E) > P(I)P(maxH/I), 

where the conditional probabilities P(maxH/E) and P(maxH/I) can be determined 
based on the estimates of probability distributions of maxH in both groups. 

Discriminant analysis allows to simplify this task by calculating the odds 
P(E/MaxH):P(I/maxH) as e b y where b is the left-hand side of a linear or quadratic inequality. 
For example, for the window of length 17, the protein is allocated to the peripheral category E 
based on the empirically derived quadratic inequality: 

1.05(maxH) 2 +12.30maxH+17.49 >0, 

whereas the optimal inequality for assigning membrane proteins (category I) is linear: 
-9.02maxH+ 14.27 >0 

The odds parameter can be made more or less stringent. For example, one can require 
odds at least 1 : 10 for a protein to be classified as integral. This leads to higher selectivity but 
less sensitivity. 

The boundaries of membrane-spanning regions in putative membrane proteins are 
detected by means of an iterative procedure whereby the most hydrophobic region 
corresponding to the value maxH is considered to be membrane and removed from the 
sequence. The classification procedure is then repeated again for the remaining sequence, 
and, if such a protein is again classified as integral, the next most hydrophobic region is 
considered. 
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Transcription factors 

Purified eukaryotic RNA polymerase II is unable to initiate promoter-specific 
transcription. A family of factors that collectively confer RNAPII promoter specificity is 
known as the general transcription factors (GTFs). They include the TATA-binding Protein 
(TBP) TFIIB, TFIIE, TFIIF and TFI IH. These factors are conserved among all eukaryotes. 

RNAPII complexes containing the entire set of GTFs or a subset of GTFs together 
with other proteins have been isolated from mammalian and yeast cells. Although purified 
RNAPII and GTFs are sufficient for promoter-specific initiation, this system fails to respond 
to activators. This is mediated by a further complex termed mediator complex which 
associates with the carboxy-terminal heptapeptide domain (CTD) of the largest subunit of 
RNAPII. 

Purification of human RNAPII complexes resulted in two distinct forms of human 
RNAPII after analysis of functional properties. One complex contained chromatin remodeling 
activities but was devoid of GTFs. The other complex did not contain factors that modify 
chromatin but contained a subset of SRB/mediator subunits and GTFs and other polypeptides 
that mediate transcriptional activation, a scenario similar to that reported for yeast. 

A complex designated NAT (-20 SU) for negative regulator of transcription contains 
RNAPII, Cdk8, homologs of the yeast mediator complex as well as Rgrl and SrblO/1 1 
known as negative regulators of transcription. 

A complex with striking similar structural and functional properties to NAT has been 
identified designated SMCC (-15 SU) (SRB/mediator coactivator complex), that can also 
mediate transcriptional activation. 

The SMCC complex includes all reported NAT subunits including subunits of the 
TRAP complex. TRAP is a coactivator complex isolated on the basis of its interaction with 
the thyroid hormone receptor. Another coactivator complex DRIP, isolated on the basis of its 
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ability to interact with the vitamin D3 receptor, contains novel subunits as well as subunits of 
NAT/SMCC and TRAP complexes. 



The effects of each of these coactivator complexes is dependent on the TFIID 
complex. It is not known if the T AF subunits of TFIID are required. It is likely that new 
coactivator complexes will be uncovered containing both novel and previously defined 
components. 

Beside the huge amount of transcription factors which can be part of the RNAIIP 
holoenzyme or the coactivator complexes there is an even larger quantity of specific 
transcription factors binding to promoter elements within the DNA sequences of a given gene 
leading to activation or repression of transcription. A broad range of cellular responses like 
differentiation, proliferation, cell death and others are elicited through activating or 
repressing the transcription of target genes. 

There are at least five superclasses of transcription factors: 

1 . Superclass contains members with characteristic basic domains: 
Members are: 

Leucine zipper factors, where the basic domain is followed by a leucine zipper of 
repeated leucine residues at every seventh position. The zipper mediates protein dimerization 
as a prerequisite for DNA-binding. 

Helix-loop-helix factors (bHLH) contain a DNA-binding basic region followed by a 
motif of two potential amphipathic alpha-helices connected by a loop of variable length also 
mediating dimerization. 

Factors with a combination of Helix-loop-helix and leucine zipper. 

Further members of this superclass are NF-1, RF-X, and bHSH like proteins. 

2. Superclass comprises factors containing zinc-coordinating DNA-binding domains. 
Members are: 
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Proteins with Cys4 zinc finger of nuclear receptor type, where two such motifs 
differing in size, composition and function are present in each receptor molecule. Each finger 
comprises 4 cysteine residues coordinating one zinc ion. The second half including the 
second cysteine pair has alpha-helix conformation and the helix of the first finger binds to the 
DNA through the major groove. The sequence between the first two cysteines of the second 
finger mediates dimerization upon DNA-binding. This class includes the steroid hormone 
receptors and the thyroid hormone receptor-like factors. Other diverse cys4 zinc fingers have 
a motif of GATA-type. 

Proteins with Cys2His2 zinc finger domain(s). Each finger comprises 2 cysteine and 2 
histidine residues coordinating one zinc ion, and in some cases one histidine is replaced by 
another cysteine. The zinc ion is essential for DNA-binding. 

Proteins with Cys6 cysteine-zinc cluster(s). Six cysteine residues coordinate two zinc 
ions, i. e. two of the thiol groups are coordinating two zinc ions each. Present in many fungal 
regulators. 

Zinc fingers of alternating composition. 

3, Superclass contains factors of helix-turn-helix type. 

Members are: 

Proteins with homeo domains. Homeo domains are three consecutive alpha-helix 
structures. Helix 3 contacts mainly the major groove of the DNA, some contacts at the minor 
groove are observed as well. Helix 2 and 3 resemble the helix-turn-helix structure of 
prokaryotic regulators. 

Proteins with Paired box domain(s). This is a DNA-binding domain of approximately 
130 amino acid residues. Its N-terminal half is basic, its C-terminal half is highly charged in 
general. It probably comprises 3 alpha-helices. 

Proteins with Fork head / winged helix domain(s). This domain was identified by 
homology between HNF-3A and fkh. The domain comprises approx. 110 AA. Analysis of the 
crystal structure has revealed a compact structure of three alpha-helices, the third alpha-helix 
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being exposed towards the major groove of the DNA. The domain also exerts minor groove 
contacts. Upon binding to DNA, it induces a bend of 13 degree. 



Heat shock factors 

Proteins with Tryptophan clusters. The tryptophan clusters comprise several 
tryptophan residues with a spacing of 12-21 amino acid residues; the subclass of myb-type 
DNA-binding domains typically exhibit a spacing of 19-21 amino acid residues. 

Proteins with TEA domain(s). The TEA domain has been identified as a region which 
is conserved among the transcription factors TEF-1, TEC1 and abaA. This domain in TEF-1 
has been shown to interact with DNA, although two additional regions may also contribute to 
DNA-binding. It is predicted to fold into three alpha-helices, with a randomly coiled region of 
16-18 amino acid residues between helices 1 and 2, and a short stretch between helices 2 and 
3 of 3-8 residues. 

4. Superclass contains beta-Scaffold Factors with Minor Groove Contacts 
Members are: 

Proteins with RHR (Rel homology) region. 

The structure of the Rel-type DBD exhibits a bipartite subdomain structure, each 
subdomain comprising a beta-barrel with five loops that form an extensive contact surface to 
the major groove of the DNA. Particularly, the first loop of the N-terminal subdomain (the 
highly conserved recognition loop) performs contacts with the recognition element on the 
DNA, but other loops are involved. The fact that the main DNA-contacts are made through 
loops has been suggested to provide a high degree of flexibility in binding to a range of 
different target sequences. Augmenting interactions are achieved by two alpha-helices within 
the N-terminal Part that form strong minor groove contacts to the A/T-rich center of the B- 
element. In p65, the sequence between both alpha-helices is much shorter and even helix 2 is 
truncated. The second, C-terminal domain is necessary mainly for protein dimerization. 

p53 proteins 
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MADS (MCMl-agamous-deficiens-SRF) box proteins. Proteins of this class comprise 
a region of homology. The DNA-binding domain also comprises the dimerization capability. 
In the DNA-bound dimer (shown for SRF), two antiparallel amphipathic alpha-helices (alpha- 
I), form a coiled coil and are oriented approximately parallel on the minor groove. These 
helices make minor and major groove contacts, the N-terminal extensions form minor groove 
contacts. The bound DNA is bent and wrapped around the protein. It exhibits a compressed 
minor groove in the center and widened minor groove in the flanks. 

Beta-Barrel alpha-helix transcription factors. 

TATA-binding proteins 

HMG proteins 

Proteins of this class comprise a region of homology with the chromosomal non- 
histone HMG proteins such as HMG1. This region comprises the DNA-binding domain 
which in some instances such as HMG1 mediates sequence-unspecific, in other cases such 
LEF-1 sequence-specific binding to DNA. This domain exhibits a typical L-shaped 
conformation made up of 3 alpha-helices and an extended N-terminal extension of the first 
helix. The latter together with helix 1, which contains a kink, form the long arm of the L, 
whereas helices 1 and 2 form the short arm. Binding to the minor groove induces a sharp 
bending of the DNA by more than 90 degree, away from the bound protein. The overall 
topology of the DNA-protein complexes resembles somewhat that of the TBP-TATA box 
complex. 

Heteromeric CCAAT factors 

Proteins with Grainyhead domain(s) 

Cold-shock domain factors. Cold-shock domain proteins are characterized by a highly 
conserved region first found in prokaryotic cold-shock proteins. This domain is a single- 
stranded nucleic acid-binding structure interacting with DNA or RNA. It consists of an 
antiparallel five-stranded beta-barrel, the strands of which are connected by turns and loops. 
Within this structure, a three-stranded beta-strand contains a conserved RNA-binding motif, 
RNP1. Not all CSD proteins are transcription factors. Those which specifically bind to a 
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certain sequence are termed Y-box proteins. Proteins of this class were previously called 
protamine-like domain proteins because of having a highly positively charged domain with 
interspersed proline residues. 

Proteins with Runt homology domain 

The members of this transcription factor class have been identified on the basis of 
their homology to a defined region within the Drosophilia protein Runt. The runt domain is 
part of the DNA-binding domain of these factors. It consists mainly of beta-strands, does not 
contain alpha-helical regions and seems to be most similar to the palm domain found in DNA 
polymerase beta (rat). 

5. Superclass contains other transcription factors like Copper fist proteins. HMGIfiO, 
STAT. Pocket domain proteins and Ap2/EREBP-related factors. 

The classification of transcription factors originates from TRANSFAC database: 

http: //transfac.gbf.de/TRANSFAC/ 

Reference: Heinemeyer 

Several categories of proteins are coded for by clones of the invention within the 
overall group of "Transcription Factors".and include, among others, the following: 

Dcoh: Dcoh is a bifiinctional protein, complexed with biopterin. It serves as 
dimerization cofactor of hepatocyte nuclear factor- 1 and catalyzes the dehydration of the 
biopterin cofactor of phenylalanine hydroxylase. The Dcoh protein has been reported by 
OMIN to be associated (as potentially diagnostic, therapeutic, causative, and/or related, 
etc. . .) with the following diseases: 1 ) hyperphenylalanemia (OMIN 1 26090, #264070). 
Clones in this category include fkd2_46kl2. 

Signal transducing proteins : Beta-transducin subunits of G-proteins contain WD-40 
repeats. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. Due to the zinc finger the novel protein 
seems to be a new molecule involved in signal transduction and transcription. These proteins 
have been reported by OMIN to be associated (as potentially diagnostic, therapeutic, 
causative, and/or related, etc.. .) with the following diseases: 1) essential hypertension 
(OMIN *139130). Clones in this category include utel_li2. 
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* * * 

The invention, therefore, specifically contemplates the following assemblages of 
materials, which track the above-identified fourteen functional groupings, that are useful in 
practicing the profiling aspects of the invention. One type of assemblage is nucleic acid- 
based and can include the following groupings of sequences and their derivatives: all 
sequences; human fetal brain sequences; brain derived sequences; human fetal kidney 
library sequences; kidney derived sequences; human mammary carcinoma library 
sequences; mammary carcinoma derived sequences; human testis library sequences; testes 
derived sequences; cell cycle genes; cell structure and motility genes; differentiation and 
development genes; intracellular transport and trafficking genes; metabolism genes; nucleic 
acid management genes; signal transduction genes; transmembrane protein genes; and 
transcription factor genes. Other assemblages contain proteins or their corresponding 
antibodies or antibody fragments, divided along the same groupings. 

Database Applications 

Because they are human genes and gene products, the inventive molecules are useful 
as members of a database. Such a database may be used, for example, in drug discovery 
and rationale drug design or in testing the novelty and non-obviousness of newly sequenced 
materials. In addition, they are particularly suited in designing variants for the profiling 
(and other) applications described herein. Hence, the following discussion of electronic 
embodiments applies equally to such variants, which, naturally, will be generated and 
stored using a computer using known methodologies. 

Accordingly, one aspect of the invention contemplates a database of at least one of 
the inventive sequences stored on computer readable media. Again, the individual 
sequences may be grouped with regard to the individual functional and structural groups 
mentioned above. While the individual sequences of a database may exist in printed form, 
they are preferably in electronic form, as in an ascii or a text file. They may also exist as 
word processing files or they may be stored in database applications like DB2, Sybase, 
Oracle, GCG and GenBank. One skilled in the art will understand the range of applications 
suitable for using and storing the electronic embodiments of the invention. 

"Computer readable media" refers to any medium which can be read and accessed 
by a computer. These include: magnetic storage media, like floppy discs, hard drives and 
magnetic tape; optical storage media, like CD-ROM; electrical storage media, like RAM 
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and ROM; and hybrids of these categories, like magnetic/optical storage media. One 
skilled in the art will readily understand the scope of computer readable media and how to 
implement them. 

Biological Activities and Assays for Implementing Therapeutic and Diagnostic 
Applications 

This section provides assays for biological activity that are useful in characterizing 
and quantifying the biological activity of the inventive molecules and their derivatives, 
which is relevant to the pharmacological effects of the inventive molecules. As used in this 
section, it will be understood that "protein" may also refer to the inventive antibodies 
(including fragments). 

Cytokine and Cell Proliferation/Differentiation Activity 

A protein of the present invention may exhibit cytokine, cell proliferation (either 
inducing or inhibiting) or cell differentiation (either inducing or inhibiting) activity or may 
induce production of other cytokines in certain cell populations. Many protein factors 
discovered to date, including all known cytokines, have exhibited activity in one or more 
factor dependent cell proliferation assays, and hence the assays serve as a convenient 
confirmation of cytokine activity. The activity of a protein of the present invention is 
evidenced by any one of a number of routine factor dependent cell proliferation assays for 
cell lines including, without limitation, 32D, DA2, DA1G, T10, B9, B9/11, BaF3, 
MC9/G, M + (preB M + ), 2E8, RB5, DAI, 123, T1165, HT2, CTLL2, TF-1, Mo7e and 
CMK. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assays for T-cell or thymocyte proliferation include without limitation those 
described in: Current Protocols in Immunology, Ed by J. E. Coligan, A. M. Kruisbeek, D. 
H. Margulies, E. M. Shevach, W. Strober, Pub. Greene Publishing Associates and Wiley- 
Interscience (Chapter 3, In Vitro assays for Mouse Lymphocyte Function 3.1-3.19; Chapter 
7, Immunologic studies in Humans); Takai et al., J. Immunol. 137:3494-3500, 1986; 
Bertagnolli et al., J. Immunol. 145:1706-1712, 1990; Bertagnolli et al., Cellular 
Immunology 133:327-341, 1991; Bertagnolli, et al., I. Immunol. 149:3778-3783, 1992; 
Bowman etal., I. Immunol. 152:1756-1761, 1994. 
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Assays for cytokine production and/or proliferation of spleen cells, lymph node cells 
or thymocytes include, without limitation, those described in: Polyclonal T cell stimulation, 
Kruisbeek, A. M. and Shevach, E. M. In Current Protocols in Immunology. J. E. e.a. 
Coligan eds. Vol 1 pp. 3.12.1-3.12.14, John Wiley and Sons, Toronto. 1994; and 
Measurement of mouse and human interleukin gamma , Schreiber, R. D. In Current 
Protocols in Immunology. J. E. e.a. Coligan eds. Vol 1 pp. 6.8.1-6.8.8, John Wiley and 
Sons, Toronto. 1994. 

Assays for proliferation and differentiation of hematopoietic and lymphopoietic cells 
include, without limitation, those described in: Measurement of Human and Murine 
Interleukin 2 and Interleukin 4, Bottomly, K., Davis, L. S. and Lipsky, P. E. In Current 
Protocols in Immunology. J. E. e.a. Coligan eds. Vol 1 pp. 6.3.1-6.3.12, John Wiley and 
Sons, Toronto. 1991; deVries et al., J. Exp. Med. 173:12054211, 1991; Moreau et ah, 
Nature 336:690-692, 1988; Greenberger et al., Proc. Natl. Acad. Sci. U.S.A. 80:2931- 
2938, 1983; Measurement of mouse and human interleukin 6-Nordan, R. In Current 
Protocols in Immunology. J. E. e.a. Coligan eds. Vol 1 pp. 6.6.1-6.6.5, John Wiley and 
Sons, Toronto. 1991; Smith etal., Proc. Natl. Aced. Sci. U.S.A. 83:1857-1861, 1986; 
Measurement of human Interleukin 11-Bennett, F., Giannotti, J.; Clark, S. C. and Turner, 
K. J. In Current Protocols in Immunology. J. E. e.a. Coligan eds. Vol 1 pp. 6.15.1 John 
Wiley and Sons, Toronto. 1991; Measurement of mouse and human Interleukin 9-Ciarletta, 
A., Giannotti, J., Clark, S. C. and Turner, K. J. In Current Protocols in Immunology. J. 
E. e.a. Coligan eds. Vol 1 pp. 6.13.1, John Wiley and Sons, Toronto. 1991. 

Assays for T-cell clone responses to antigens (which will identify, among others, 
proteins that affect APC-T cell interactions as well as direct T-cell effects by measuring 
proliferation and cytokine production) include, without limitation, those described in: 
Current Protocols in Immunology, Ed by J. E. Coligan, A. M. Kruisbeek, D. H. 
Margulies, E. M. Shevach, W Strober, Pub. Greene Publishing Associates and Wiley- 
Interscience (Chapter 3, In Vitro assays for Mouse Lymphocyte Function; Chapter 6, 
Cytokines and their cellular receptors; Chapter 7, Immunologic studies in Humans); 
Weinberger et al., Proc. Natl. Acad. Sci. USA 77:6091-6095, 1980; Weinberger et al., 
Eur. J. Immun. 11:405-411, 1981; Takai etal., J. Immunol. 137:3494-3500, 1986; Takai 
etal., J. Immunol. 140:508-512, 1988. 
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Immune Stimulating or Suppressing Activity 

A protein of the present invention may also exhibit immune stimulating or immune 
suppressing activity, including without limitation the activities for which assays are 
described herein. A protein may be useful in the treatment of various immune deficiencies 
and disorders (including severe combined immunodeficiency (SCID)), e.g., in regulating 
(up or down) growth and proliferation of T and/or B lymphocytes, as well as effecting the 
cytolytic activity of NK cells and other cell populations. These immune deficiencies may be 
genetic or be caused by vital (e.g., HIV) as well as bacterial or fungal infections, or may 
result from autoimmune disorders. More specifically, infectious diseases causes by viral, 
bacterial, fungal or other infection may be treatable using a protein of the present invention, 
including infections by HIV, hepatitis viruses, herpesviruses, mycobacteria, Leishmania 
spp., malaria spp. and various fungal infections such as candidiasis. Of course, in this 
regard, a protein of the present invention may also be useful where a boost to the immune 
system generally may be desirable, i.e., in the treatment of cancer. 

Autoimmune disorders which may be treated using a protein of the present invention 
include, for example, connective tissue disease, multiple sclerosis, systemic lupus 
erythematosus, rheumatoid arthritis, autoimmune pulmonary inflammation, Guillain-Barre 
syndrome, autoimmune thyroiditis, insulin dependent diabetes mellitis, myasthenia gravis, 
graft-versus-host disease and autoimmune inflammatory eye disease. Such a protein of the 
present invention may also to be useful in the treatment of allergic reactions and conditions, 
such as asthma (particularly allergic asthma) or other respiratory problems. Other 
conditions, in which immune suppression is desired (including, for example, organ 
transplantation), may also be treatable using a protein of the present invention. 

Using the proteins of the invention it may also be possible to modify immune 
responses, in a number of ways. Down regulation may be in the form of inhibiting or 
blocking an immune response already in progress or may involve preventing the induction 
of an immune response. The functions of activated T cells may be inhibited by suppressing 
T cell responses or by inducing specific tolerance in T cells, or both. Immunosuppression 
of T cell responses is generally an active, non-antigen-specific, process which requires 
continuous exposure of the T cells to the suppressive agent. Tolerance, which involves 
inducing non-responsiveness or anergy in T cells, is distinguishable from 
immunosuppression in that it is generally antigen-specific and persists after exposure to the 
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tolerizing agent has ceased. Operationally, tolerance can be demonstrated by the lack of a T 
cell response upon reexposure to specific antigen in the absence of the tolerizing agent. 

Down regulating or preventing one or more antigen functions (including without 
limitation B lymphocyte antigen functions (such as, for example, B7)), e.g., preventing 
high level lymphokine synthesis by activated T cells, will be useful in situations of tissue, 
skin and organ transplantation and in graft-versus-host disease (GVHD). For example, 
blockage of T cell function should result in reduced tissue destruction in tissue 
transplantation. Typically, in tissue transplants, rejection of the transplant is initiated 
through its recognition as foreign by T cells, followed by an immune reaction that destroys 
the transplant. The administration of a molecule which inhibits or blocks interaction of a B7 
lymphocyte antigen with its natural ligand(s) on immune cells (such as a soluble, 
monomeric form of a peptide having B7-2 activity alone or in conjunction with a 
monomeric form of a peptide having an activity of another B lymphocyte antigen (e.g., B7- 
1, B7-3) or blocking antibody), prior to transplantation can lead to the binding of the 
molecule to the natural ligand(s) on the immune cells without transmitting the 
corresponding costimulatory signal. Blocking B lymphocyte antigen function in this matter 
prevents cytokine synthesis by immune cells, such as T cells, and thus acts as an 
immunosuppressant. Moreover, the lack of costimulation may also be sufficient to anergize 
the T cells, thereby inducing tolerance in a subject. Induction of long-term tolerance by B 
lymphocyte antigen-blocking reagents may avoid the necessity of repeated administration of 
these blocking reagents. To achieve sufficient immunosuppression or tolerance in a subject, 
it may also be necessary to block the function of a combination of B lymphocyte antigens. 

The efficacy of particular blocking reagents in preventing organ transplant rejection 
or GVHD can be assessed using animal models that are predictive of efficacy in humans. 
Examples of appropriate systems which can be used include allogeneic cardiac grafts in rats 
and xenogeneic pancreatic islet cell grafts in mice, both of which have been used to 
examine the immunosuppressive effects of CTLA4Ig fusion proteins in vivo as described in 
Lenschow et al., Science 257:789-792 (1992) and Turka et al., Proc. Natl. Acad. Sci USA, 
89:11102-11105 (1992). In addition, murine models of GVHD (see Paul ed., Fundamental 
Immunology, Raven Press, New York, 1989, pp. 846-847) can be used to determine the 
effect of blocking B lymphocyte antigen function in vivo on the development of that 
disease. 
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Blocking antigen function may also be therapeutically useful for treating 
autoimmune diseases. Many autoimmune disorders are the result of inappropriate activation 
of T cells that are reactive against self tissue and which promote the production of cytokines 
and autoantibodies involved in the pathology of the diseases. Preventing the activation of 
autoreactive T cells may reduce or eliminate disease symptoms. Administration of reagents 
which block costimulation of T cells by disrupting receptonligand interactions of B 
lymphocyte antigens can be used to inhibit T cell activation and prevent production of 
autoantibodies or T cell-derived cytokines which may be involved in the disease process. 
Additionally, blocking reagents may induce antigen-specific tolerance of autoreactive T 
cells which could lead to long-term relief from the disease. The efficacy of blocking 
reagents in preventing or alleviating autoimmune disorders can be determined using a 
number of well-characterized animal models of human autoimmune diseases. Examples 
include murine experimental autoimmune encephalitis, systemic lupus erythmatosis in 
MRL/lpr/lpr mice or NZB hybrid mice, murine autoimmune collagen arthritis, diabetes 
mellitus in NOD mice and BB rats, and murine experimental myasthenia gravis (see Paul 
ed., Fundamental Immunology, Raven Press, New York, 1989, pp. 840-856). 

Upregulation of an antigen function (preferably a B lymphocyte antigen function), as 
a means of up regulating immune responses, may also be useful in therapy. Upregulation of 
immune responses may be in the form of enhancing an existing immune response or 
eliciting an initial immune response. For example, enhancing an immune response through 
stimulating B lymphocyte antigen function may be useful in cases of viral infection. In 
addition, systemic viral diseases such as influenza, the common cold, and encephalitis 
might be alleviated by the administration of stimulatory forms of B lymphocyte antigens 
systemically. 

Alternatively, anti- vital immune responses may be enhanced in an infected patient 
by removing T cells from the patient, costimulating the T cells in vitro with viral antigen- 
pulsed APCs either expressing a peptide of the present invention or together with a 
stimulatory form of a soluble peptide of the present invention and reintroducing the in vitro 
activated T cells into the patient. Another method of enhancing anti-viral immune responses 
would be to isolate infected cells from a patient, transfect them with a nucleic acid encoding 
a protein of the present invention as described herein such that the cells express all or a 
portion of the protein on their surface, and reintroduce the transfected cells into the patient. 
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The infected cells would now be capable of delivering a costimulatory signal to, and 
thereby activate, T cells in vivo. 

In another application, up regulation or enhancement of antigen function (preferably 
B lymphocyte antigen function) may be useful in the induction of tumor immunity. Tumor 
cells (e.g., sarcoma, melanoma, lymphoma, leukemia, neuroblastoma, carcinoma) 
transfected with a nucleic acid encoding at least one peptide of the present invention can be 
administered to a subject to overcome tumor-specific tolerance in the subject. If desired, the 
tumor cell can be transfected to express a combination of peptides. For example, tumor 
cells obtained from a patient can be transfected ex vivo with an expression vector directing 
the expression of a peptide having B7-2-like activity alone, or in conjunction with a peptide 
having B7-l-like activity and/or B7-3-like activity. The transfected tumor cells are returned 
to the patient to result in expression of the peptides on the surface of the transfected cell. 
Alternatively, gene therapy techniques can be used to target a tumor cell for transfection in 
vivo. 

The presence of the peptide of the present invention having the activity of a B 
lymphocyte antigen(s) on the surface of the tumor cell provides the necessary costimulation 
signal to T cells to induce a T cell mediated immune response against the transfected tumor 
cells. In addition, tumor cells which lack MHC class I or MHC class II molecules, or 
which fail to reexpress sufficient mounts of MHC class I or MHC class II molecules, can 
be transfected with nucleic acid encoding all or a portion of (e.g., a cytoplasmic-domain 
truncated portion) of an MHC class I alpha chain protein and beta 2 microglobulin protein 
or an MHC class II alpha chain protein and an MHC class II beta chain protein to thereby 
express MHC class I or MHC class II proteins on the cell surface. Expression of the 
appropriate class I or class II MHC in conjunction with a peptide having the activity of a B 
lymphocyte antigen (e.g., B7-1, B7-2, B7-3) induces a T cell mediated immune response 
against the transfected tumor cell. Optionally, a gene encoding an antisense construct which 
blocks expression of an MHC class II associated protein, such as the invariant chain, can 
also be cotransfected with a DNA encoding a peptide having the activity of a B lymphocyte 
antigen to promote presentation of tumor associated antigens and induce tumor specific 
immunity. Thus, the induction of a T cell mediated immune response in a human subject 
may be sufficient to overcome tumor-specific tolerance in the subject. 
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The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Suitable assays for thymocyte or splenocyte cytotoxicity include, without limitation, 
those described in: Current Protocols in Immunology, Ed by J. E. Coligan, A. M. 
Kruisbeek, D. H. Margulies, E. M. Shevach, W. Strober, Pub. Greene Publishing 
Associates and Wiley -Interscience (Chapter 3, In Vitro assays for Mouse Lymphocyte 
Function 3.1-3.19; Chapter 7, Immunologic studies in Humans); Herrmann et al., Proc. 
Natl. Acad. Sci. USA 78:2488-2492, 1981; Herrmann et al., J. Immunol. 128:1968-1974, 
1982; Handa et al., J. Immunol. 135:1564-1572, 1985; Takai et al., I. Immunol. 137:3494- 
3500, 1986; Takai et al., J. Immunol. 140:508-512, 1988; Herrmann et al., Proc. Natl. 
Acad. Sci. USA 78:2488-2492, 1981; Herrmann et al., J. Immunol. 128:1968-1974, 1982; 
Handa et al., J. Immunol. 135:1564-1572, 1985; Takai et al., J. Immunol. 137:3494-3500, 
1986; Bowmanet al., J. Virology 61:1992-1998; Takai et al., J. Immunol. 140:508-512, 
1988; Bertagnolli et al., Cellular Immunology 133:327-341, 1991; Brown et al., J. 
Immunol. 153:3079-3092, 1994. 

Assays for T-cell-dependent immunoglobulin responses and isotype switching 
(which will identify, among others, proteins that modulate T-cell dependent antibody 
responses and that affect Thl/Th2 profiles) include, without limitation, those described in: 
Maliszewski, J. Immunol. 144:3028-3033, 1990; and Assays for B cell function: In vitro 
antibody production, Mond, J. J. and Brunswick, M. In Current Protocols in Immunology. 
J. E. e.a. Coligan eds. Vol 1 pp. 3.8.1-3.8.16, John Wiley and Sons, Toronto. 1994. 

Mixed lymphocyte reaction (MLR) assays (which will identify, among others, 
proteins that generate predominantly Thl and CTL responses) include, without limitation, 
those described in: Current Protocols in Immunology, Ed by J. E. Coligan, A. M. 
Kruisbeek, D. H. Margulies, E. M. Shevach, W. Strober, Pub. Greene Publishing 
Associates and Wiley-Interscience (Chapter 3, In Vitro assays for Mouse Lymphocyte 
Function 3.1-3.19; Chapter 7, Immunologic studies in Humans); Takai et al., J. Immunol. 
137:3494-3500, 1986; Takai et al., J. Immunol. 140:508-512, 1988; Bertagnolli et al., J. 
Immunol. 149:3778-3783, 1992. 

Dendritic cell-dependent assays (which will identify, among others, proteins 
expressed by dendritic cells that activate naive T-cells) include, without limitation, those 
described in: Guery et al., J. Immunol. 134:536-544, 1995; Inaba et al., Journal of 
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Experimental Medicine 173:549-559, 1991; Macatonia et al., Journal of Immunology 
154:5071-5079, 1995; Porgador et al., Journal of Experimental Medicine 182:255-260, 
1995; Nair et al., Journal of Virology 67:4062-4069, 1993; Huang et al., Science 264:961- 
965, 1994; Macatonia et al., Journal of Experimental Medicine 169:1255-1264, 1989; 
Bhardwaj et al., Journal of Clinical Investigation 94:797-807, 1994; and Inaba et al., 
Journal of Experimental Medicine 172:631-640, 1990. 

Assays for lymphocyte survival/apoptosis (which will identify, among others, 
proteins that prevent apoptosis after superantigen induction and proteins that regulate 
lymphocyte homeostasis) include, without limitation, those described in: Darzynkiewicz et 
al., Cytometry 13:795-808, 1992; Gorczyca et al., Leukemia 7:659-670, 1993; Gorczyca et 
al., Cancer Research 53:1945-1951, 1993; Itoh et al., Cell 66:233-243, 1991; Zacharchuk, 
Journal of Immunology 145:4037-4045, 1990; Zamai et al., Cytometry 14:891-897, 1993; 
Gorczyca et al., International Journal of Oncology 1:639-648, 1992. 

Assays for proteins that influence early steps of T-cell commitment and development 
include, without limitation, those described in: Antica et al., Blood 84:111-117, 1994; Fine 
et al., Cellular Immunology 155:111-122, 1994; Galy et al., Blood 85:2770-2778, 1995; 
Toki et al., Proc. Nat. Acad Sci. USA 88:7548-7551, 1991. 
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Hematopoiesis Regulating Activity 

A protein of the present invention may be useful in regulation of hematopoiesis and, 
consequently, in the treatment of myeloid or lymphoid cell deficiencies. Even marginal 
biological activity in support of colony forming cells or of factor-dependent cell lines 
indicates involvement in regulating hematopoiesis, e.g. in supporting the growth and 
proliferation of erythroid progenitor cells alone or in combination with other cytokines, 
thereby indicating utility, for example, in treating various anemias or for use in conjunction 
with irradiation/chemotherapy to stimulate the production of erythroid precursors and/or 
erythroid cells; in supporting the growth and proliferation of myeloid cells such as 
granulocytes and monocytes/macrophages (i.e., traditional CSF activity) useful, for 
example, in conjunction with chemotherapy to prevent or treat consequent myelo- 
suppression; in supporting the growth and proliferation of megakaryocytes and 
consequently of platelets thereby allowing prevention or treatment of various platelet 
disorders such as thrombocytopenia, and generally for use in place of or complimentary to 
platelet transfusions; and/or in supporting the growth and proliferation of hematopoietic 
stem cells which are capable of maturing to any and all of the above-mentioned 
hematopoietic cells and therefore find therapeutic utility in various stem cell disorders (such 
as those usually treated with transplantation, including, without limitation, aplastic anemia 
and paroxysmal nocturnal hemoglobinuria), as well as in repopulating the stem cell 
compartment post irradiation/chemotherapy, either in-vivo or ex-vivo (i.e., in conjunction 
with bone marrow transplantation or with peripheral progenitor cell transplantation 
(homologous or heterologous)) as normal cells or genetically manipulated for gene therapy. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Suitable assays for proliferation and differentiation of various hematopoietic lines 
are cited above. 

Assays for embryonic stem cell differentiation (which will identify, among others, 
proteins that influence embryonic differentiation hematopoiesis) include, without limitation, 
those described in: Johansson et al. Cellular Biology 15:141-151, 1995; Keller et al., 
Molecular and Cellular Biology 13:473-486, 1993; McClanahan et al., Blood 81:2903- 
2915, 1993. 
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Assays for stem cell survival and differentiation (which will identify, among others, 
proteins that regulate lympho-hematopoiesis) include, without limitation, those described 
in: Methylcellulose colony forming assays, Freshney, M. G. In Culture of Hematopoietic 
Cells. R. I. Freshney, et al. eds. Vol pp. 265-268, Wiley-Liss, Inc., New York, N.Y. 
1994; Hirayamaet al., Proc. Natl. Acad. Sci. USA 89:5907-5911, 1992; Primitive 
hematopoietic colony forming cells with high proliferative potential, McNiece, I. K. and 
Briddell, R. A. In Culture of Hematopoietic Cells. R. I. Freshney, et al. eds. Vol pp. 23- 
39, Wiley-Liss, Inc., New York, N.Y. 1994; Neben et al., Experimental Hematology 
22:353-359, 1994; Cobblestone area forming cell assay, Ploemacher, R. E. In Culture of 
Hematopoietic Cells. R. I. Freshney, et al. eds. Vol pp. 1-21, Wiley-Liss, Inc., New York, 
N.Y. 1994; Long term bone marrow cultures in the presence of stromal cells, Spooncer, 
E., Dexter, M. and Allen, T. In Culture of Hematopoietic Cells. R. I. Freshney, et al. eds. 
Vol pp. 163-179, Wiley-Liss, Inc., New York, N.Y. 1994; Long term culture initiating cell 
assay, Sutherland, H. J. In Culture of Hematopoietic Cells. R. I. Freshney, et al. eds. Vol 
pp. 139-162, Wiley-Liss, Inc., New York, N.Y. 1994. 

Tissue Growth Activity 

A protein of the present invention also may have utility in compositions used for 
bone, cartilage, tendon, ligament and/or nerve tissue growth or regeneration, as well as for 
wound healing and tissue repair and replacement, and in the treatment of burns, incisions 
and ulcers. 

A protein of the present invention, which induces cartilage and/or bone growth in 
circumstances where bone is not normally formed, has application in the healing of bone 
fractures and cartilage damage or defects in humans and other animals. Such a preparation 
employing a protein of the invention may have prophylactic use in closed as well as open 
fracture reduction and also in the improved fixation of artificial joints. De novo bone 
formation induced by an osteogenic agent contributes to the repair of congenital, trauma 
induced, or oncologic resection induced craniofacial defects, and also is useful in cosmetic 
plastic surgery. 

A protein of this invention may also be used in the treatment of periodontal disease, 
and in other tooth repair processes. Such agents may provide an environment to attract 
bone-forming cells, stimulate growth of bone-forming cells or induce differentiation of 
progenitors of bone-forming cells. A protein of the invention may also be useful in the 
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treatment of osteoporosis or osteoarthritis, such as through stimulation of bone and/or 
cartilage repair or by blocking inflammation or processes of tissue destruction (collagenase 
activity, osteoclast activity, etc.) mediated by inflammatory processes. 

Another category of tissue regeneration activity that may be attributable to the 
protein of the present invention is tendon/ligament formation. A protein of the present 
invention, which induces tendon/ligament-like tissue or other tissue formation in 
circumstances where such tissue is not normally formed, has application in the healing of 
tendon or ligament tears, deformities and other tendon or ligament defects in humans and 
other animals. Such a preparation employing a tendon/ligament-like tissue inducing protein 
may have prophylactic use in preventing damage to tendon or ligament tissue, as well as 
use in the improved fixation of tendon or ligament to bone or other tissues, and in repairing 
defects to tendon or ligament tissue. De novo tendon/ligament-like tissue formation induced 
by a composition of the present invention contributes to the repair of congenital, trauma 
induced, or other tendon or ligament defects of other origin, and is also useful in cosmetic 
plastic surgery for attachment or repair of tendons or ligaments. The compositions of the 
present invention may provide environment to attract tendon- or ligament-forming cells, 
stimulate growth of tendon- or ligament-forming cells, induce differentiation of progenitors 
of tendon- or ligament-forming cells, or induce growth of tendon/ligament cells or 
progenitors ex vivo for return in vivo to effect tissue repair. The compositions of the 
invention may also be useful in the treatment of tendonitis, carpal tunnel syndrome and 
other tendon or ligament defects. The compositions may also include an appropriate matrix 
and/or sequestering agent as a carrier as is well known in the art. 

The protein of the present invention may also be useful for proliferation of neural 
cells and for regeneration of nerve and brain tissue, i.e. for the treatment of central and 
peripheral nervous system diseases and neuropathies, as well as mechanical and traumatic 
disorders, which involve degeneration, death or trauma to neural cells or nerve tissue. 
More specifically, a protein may be used in the treatment of diseases of the peripheral 
nervous system, such as peripheral nerve injuries, peripheral neuropathy and localized 
neuropathies, and central nervous system diseases, such as Alzheimer's, Parkinson's 
disease, Huntington's disease, amyotrophic lateral sclerosis, and Shy-Drager syndrome. 
Further conditions which may be treated in accordance with the present invention include 
mechanical and traumatic disorders, such as spinal cord disorders, head trauma and 
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cerebrovascular diseases such as stroke. Peripheral neuropathies resulting from 
chemotherapy or other medical therapies may also be treatable using a protein of the 
invention. 

Proteins of the invention may also be useful to promote better or faster closure of 
non-healing wounds, including without limitation pressure ulcers, ulcers associated with 
vascular insufficiency, surgical and traumatic wounds, and the like. 

It is expected that a protein of the present invention may also exhibit activity for 
generation or regeneration of other tissues, such as organs (including, for example, 
pancreas, liver, intestine, kidney, skin, endothelium), muscle (smooth, skeletal or cardiac) 
and vascular (including vascular endothelium) tissue, or for promoting the growth of cells 
comprising such tissues. Part of the desired effects may be by inhibition or modulation of 
fibrotic scarring to allow normal tissue to regenerate. A protein of the invention may also 
exhibit angiogenic activity. 

A protein of the present invention may also be useful for gut protection or 
regeneration and treatment of lung or liver fibrosis, reperfusion injury in various tissues, 
and conditions resulting from systemic cytokine damage. 

A protein of the present invention may also be useful for promoting or inhibiting 
differentiation of tissues described above from precursor tissues or cells; or for inhibiting 
the growth of tissues described above. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assays for tissue generation activity include, without limitation, those described in: 
International Patent Publication No. WO95/16035 (bone, cartilage, tendon); International 
Patent Publication No. WO95/05846 (nerve, neuronal); International Patent Publication 
No. WO91/07491 (skin, endothelium). 

Assays for wound healing activity include, without limitation, those described in: 
Winter, Epidermal Wound Healing, pps. 71-112 (Maibach, H. I. and Rovee, D. T., eds.), 
Year Book Medical Publishers, Inc., Chicago, as modified by Eaglstein and Mertz, J. 
Invest. Dermatol 71:382-84 (1978). 

Activin/Inhibin Activity 

A protein of the present invention may also exhibit activin- or inhibin-related 
activities. Inhibins are characterized by their ability to inhibit the release of follicle 
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stimulating hormone (FSH), while activins and are characterized by their ability to 
stimulate the release of follicle stimulating hormone (FSH). Thus, a protein of the present 
invention, alone or in heterodimers with a member of the inhibin alpha family, may be 
useful as a contraceptive based on the ability of inhibins to decrease fertility in female 
mammals and decrease spermatogenesis in male mammals. Administration of sufficient 
amounts of other inhibins can induce infertility in these mammals. Alternatively, the protein 
of the invention, as a homodimer or as a heterodimer with other protein subunits of the 
inhibin- beta group, may be useful as a fertility inducing therapeutic, based upon the ability 
of activin molecules in stimulating FSH release from cells of the anterior pituitary. See, for 
example, U.S. Pat. No. 4,798,885. A protein of the invention may also be useful for 
advancement of the onset of fertility in sexually immature mammals, so as to increase the 
lifetime reproductive performance of domestic animals such as cows, sheep and pigs. 

" The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assays for activin/inhibin activity include, without limitation, those described in: 
Vale et al., Endocrinology 91:562-572, 1972; Ling et al., Nature 321:779-782, 1986; Vale 
et al., Nature 321:776-779, 1986; Mason et al., Nature 318:659-663, 1985; Forage et al., 
Proc. Natl. Acad. Sci. USA 83:3091-3095, 1986. 

Chemotactic/Chemokinetic Activity 

A protein of the present invention may have chemotactic or chemokinetic activity 
(e.g., act as a chemokine) for mammalian cells, including, for example, monocytes, 
fibroblasts, neutrophils, T-cells, mast cells, eosinophils, epithelial and/or endothelial cells. 
Chemotactic and chemokinetic proteins can be used to mobilize or attract a desired cell 
population to a desired site of action. Chemotactic or chemokinetic proteins provide 
particular advantages in treatment of wounds and other trauma to tissues, as well as in 
treatment of localized infections. For example, attraction of lymphocytes, monocytes or 
neutrophils to tumors or sites of infection may result in improved immune responses against 
the tumor or infecting agent. 

A protein or peptide has chemotactic activity for a particular cell population if it can 
stimulate, directly or indirectly, the directed orientation or movement of such cell 
population. Preferably, the protein or peptide has the ability to directly stimulate directed 
movement of cells. Whether a particular protein has chemotactic activity for a population of 
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cells can be readily determined by employing such protein or peptide in any known assay 
for cell chemotaxis. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assays for chemotactic activity (which will identify proteins that induce or prevent 
chemotaxis) consist of assays that measure the ability of a protein to induce the migration of 
cells across a membrane as well as the ability of a protein to induce the adhesion of one cell 
population to another cell population. Suitable assays for movement and adhesion include, 
without limitation, those described in: Current Protocols in Immunology, Ed by J. E. 
Coligan, A. M. Kruisbeek, D. H. Marguiles, E. M. Shevach, W. Strober, Pub. Greene 
Publishing Associates and Wiley-Interscience (Chapter 6.12, Measurement of alpha and 
beta Chemokines 6.12.1-6.12.28; Taub et al. J. Clin. Invest. 95:1370-1376, 1995; Lind et 
al. APMIS 103:140-146, 1995; Muller et al Eur. J. Immunol. 25:1744-1748; Gruber et al. 
J. of Immunol. 152:5860-5867, 1994; Johnston et al. J. of Immunol. 153:1762-1768, 1994. 

Hemostatic and Thrombolytic Activity 

A protein of the invention may also exhibit hemostatic or thrombolytic activity. As a 
result, such a protein is expected to be useful in treatment of various coagulation disorders 
(including hereditary disorders, such as hemophilias) or to enhance coagulation and other 
hemostatic events in treating wounds resulting from trauma, surgery or other causes. A 
protein of the invention may also be useful for dissolving or inhibiting formation of 
thromboses and for treatment and prevention of conditions resulting therefrom (such as, for 
exiample, infarction of cardiac and central nervous system vessels (e.g., stroke). 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assay for hemostatic and thrombolytic activity include, without limitation, those 
described in: Linet et al., J. Clin. Pharmacol. 26:131-140, 1986; Burdick et al., 
Thrombosis Res. 45:413-419, 1987; Humphrey et al., Fibrinolysis 5:71-79 (1991); Schaub, 
Prostaglandins 35:467^74, 1988. 

Receptor/Lipand Activity 

A protein of the present invention may also demonstrate activity as receptors, 
receptor ligands or inhibitors or agonists of receptor/ligand interactions. Examples of such 
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receptors and ligands include, without limitation, cytokine receptors and their ligands, 
receptor kinases and their ligands, receptor phosphatases and their ligands, receptors 
involved in cell-cell interactions and their ligands (including without limitation, cellular 
adhesion molecules (such as selectins, integrins and their ligands) and receptor/ligand pairs 
involved in antigen presentation, antigen recognition and development of cellular and 
humoral immune responses). Receptors and ligands are also useful for screening of 
potential peptide or small molecule inhibitors of the relevant receptor/ligand interaction. A 
protein of the present invention (including, without limitation, fragments of receptors and 
ligands) may themselves be useful as inhibitors of receptor/ligand interactions. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Suitable assays for receptor-Iigand activity include without limitation those 
described in:Current Protocols in Immunology, Ed by J. E. Coligan, A. M. Kruisbeek, D. 
H. Margulies, E. M. Shevach, W. Strober, Pub. Greene Publishing Associates and Wiley- 
Interscience (Chapter 7.28, Measurement of Cellular Adhesion under static conditions 
7.28.1-7.28.22), Takai et al., Proc. Natl. Acad. Sci. USA 84:6864-6868, 1987; Bierer et 
ai., J. Exp. Med. 168:1145-1156, 1988; Rosenstein et al., J. Exp. Med. 169:149-160 
1989; Stoltenborg et al., J. Immunol. Methods 175:59-68, 1994; Stitt et al., Cell 80:661- 
670, 1995. 

Anti-Inflammatory Activity 

Proteins of the present invention may also exhibit anti-inflammatory activity. The 
anti-inflammatory activity may be achieved by providing a stimulus to cells involved in the 
inflammatory response, by inhibiting or promoting cell-cell interactions (such as, for 
example, cell adhesion), by inhibiting or promoting chemotaxis of cells involved in the 
inflammatory process, inhibiting or promoting cell extravasation, or by stimulating or 
suppressing production of other factors which more directly inhibit or promote an 
inflammatory response. Proteins exhibiting such activities can be used to treat inflammatory 
conditions including chronic or acute conditions), including without limitation intimation 
associated with infection (such as septic shock, sepsis or systemic inflammatory response 
syndrome (SIRS)), ischemia-reperfusion injury, endotoxin lethality, arthritis, complement- 
mediated hyperacute rejection, nephritis, cytokine or chemokine-induced lung injury, 
inflammatory bowel disease, Crohn's disease or resulting from over production of 
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cytokines such as TNF or IL-1. Proteins of the invention may also be useful to treat 
anaphylaxis and hypersensitivity to an antigenic substance or material. 



Tumor Inhibition Activity 

In addition to the activities described above for immunological treatment or 
prevention of tumors, a protein of the invention may exhibit other anti-tumor activities. A 
protein may inhibit tumor growth directly or indirectly (such as, for example, via ADCC). 
A protein may exhibit its tumor inhibitory activity by acting on tumor tissue or tumor 
precursor tissue, by inhibiting formation of tissues necessary to support tumor growth (such 
as, for example, by inhibiting angiogenesis), by causing production of other factors, agents 
or cell types which inhibit tumor growth, or by suppressing, eliminating or inhibiting 
factors, agents or cell types which promote tumor growth. 

Other Activities 

A protein of the invention may also exhibit one or more of the following additional 
activities or effects: inhibiting the growth, infection or function of, or killing, infectious 
agents, including, without limitation, bacteria, viruses, fungi and other parasites; effecting 
(suppressing or enhancing) bodily characteristics, including, without limitation, height, 
weight, hair color, eye color, skin, fat to lean ratio or other tissue pigmentation, or organ 
or body part size or shape (such as, for example, breast augmentation or diminution, 
change in bone form or shape); effecting biorhythms or caricadic cycles or rhythms; 
effecting the fertility of male or female subjects; effecting the metabolism, catabolism, 
anabolism, processing, utilization, storage or elimination of dietary fat, lipid, protein, 
carbohydrate, vitamins, minerals, cofactors or other nutritional factors or component(s); 
effecting behavioral characteristics, including, without limitation, appetite, libido, stress, 
cognition (including cognitive disorders), depression (including depressive disorders) and 
violent behaviors; providing analgesic effects or other pain reducing effects; promoting 
differentiation and growth of embryonic stem cells in lineages other than hematopoietic 
lineages; hormonal or endocrine activity; in the case of enzymes, correcting deficiencies of 
the enzyme and treating deficiency-related diseases; treatment of hyperproliferative 
disorders (such as, for example, psoriasis); immunoglobulin-like activity (such as, for 
example, the ability to bind antigens or complement); and the ability to act as an antigen in 
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a vaccine composition to raise an immune response against such protein or another 
material or entity which is cross-reactive with such protein. 



Particular Applications for Certain Clones 

The following sets out a non-exclusive list of applications for certain embodiments of 
the invention. In the interest of economy, applications relevant to multiple embodiments are 
not duplicated in this list. Other embodiments described in below have similar 
characteristics, as described therein. The artisan is directed, therefore, to this section for 
similar descriptions of the functions of other embodiment. 
Testes 

htes3_l 5c24: The new protein can find application in modulation of 2-hydroxyacid 
dehydrogenases-dependent pathways and as a new enzyme for biotechnologic 
production processes. 

htes3_15i5: The new protein can find application in modulating the structure of the 
human spermatozoa radia spoke head and modulation of sperm motility in men. 

htes3_15kl 1 : The novel protein contains a protein kinase ATP-binding region 
signature and a serine/threonine protein kinase active-site signature. The new protein 
can find application in modulation of intracellular signal pathways dependent on this 
kinase. 

htes3_17nl2: The new protein can find application in modulating/blocking the 
expression of SOX-controlled genes. 

htes3_20k2: The new protein can find application as a target for the development of 
new nociception-modulating drugs. 

htes3_20ml8: The new protein can find application in modulation of mitochondrial 
DNA replication and maintenance. 

htes3_20d4: The new protein can find application in the regulation of gene 
expression by activition of nuclear GTP-binding proteins. The X-linked retinitis 
pigmentosa is a result of a defect GTPase regulator, which contains a RCCl-type 
repeat. 
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htes3_21jl5: NY-CO-33 is a protein recognised by autologous antibodies of human 
colon cancer patients. The novel protein contains 4 C2H2 Zinc fingers and is a new 
putativ transcription factor. The new protein can find application in 
modulating/blocking the expression of genes controlled by this transcription factor. 

The new protein can find application in modulating chromosome transport in mitosis 
and meiosis and modulation of cell division. 

htes3_26g22: The new protein can find application in modulating chromosome 
transport in mitosis and meiosis and modulation of cell division. The novel TBP- 
binding protein is considered to participate in transcription regulation through the 
interaction with TBP. The new protein can find application in modulation of gene 
transcription. 

htes3_21116: The new protein can find application in modulation of protein 
translocation into the endoplasmic reticulum. 

htes3_27dl : The novel protein can find application in modulation of ubiquitin- and 
protein metabolism in cells. 

htes3_2ml 8: The novel protein can find application as multifunctional nuclease / 
exoribonuclease. 

htes3_35b4: The new protein can find application in modulation of the mitotic 
spindle. 

htes3_35b5: The novel protein can find application in modulating the v-ATPase 
activity in endocytic and secretory organelles. 

htes3_35e21 : Due to the close relationship to human interleukin-7, the novel 
interleukin is expected to act as a new growth factor for human B lineage cells. 
Additionally, the protein should induce the gene rearrangement of the T-cell receptor 
repertoire, leading to thymocyte commitment, and subsequently induce both cytotoxic 
T-cell- and lymphocyte-activated killer cells. This new interleukin could find clinical 
application in a variety of conditions of hematolymphopoietic failure and different 
tumours, because of its recruitment of B cell lineage cells, cytotoxic T-cell- and 
lymphocyte-activated killer cells. 
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htes3_35kl6: Therefore it is a new fatty acid-CoA synthetasese/ligase with unknown 
substrate. The new protein can find application in modulation of fatty acid 
metabolism and as a new enzyme for biotechnologic production processes. 

htes3_35nl2: The new protein can find application in modulation of ADP-transport 
and energy metabolism in cells/mitochondria. 

htes3_35n9: The new protein can find application in modulation of carboxylester 
metabolism and as a new enzyme for biotechnologic production processes. 

htes3_35p22: The novel protein is closely raleted to human tre-2 and other enzymes 
involved in the degradation of ubiquitinated proteins. The human tre-2 oncogene 
encodes a deubiquitinating enzyme, indicating a role for the ubiquitin system in 
mammalian growth control. The novel protein can find application in cancer 
diagnostics and treatment, and in regulating protein stability and growth control via 
regulation of ubiquitination. 

htes3_4h6: The novel kinesin protein can find application in modulating the function 
of kinesin and modulating intracellular transport via/on microtubules. 

htes3_72kl5: FGD1 -related F-actin-binding protein (Farbin/FGDl) is a novel F-actin- 
binding protein. The gene locus fgdl seems to be responsible for faciogenital 
dysplasia or Aarskog-Scott syndrome. Frabin binds F-actin and shows F-actin-cross- 
linking activity. Overexpression of frabin in Swiss 3T3 cells and COS7 cells induces 
cell shape change and c-Jun N-terminal kinase activation, as described for FGD1 . 
Because FGD1 has been shown to serve as a GDP/GTP exchange protein for Cdc42 
small G protein, it is likely that frabin is a direct linker between Cdc42 and the actin 
cytoskeleton. Cdc42p is an esin yeast, Cdc42p transduces signals to the actin 
cytoskeleton to initiate and maintain polarized growth and to mitogen-activated 
protein morphogenesis. In mammalian cells, Cdc42p regulates a variety of actin- 
dependent events and induces the JNK/SAPK protein kinase cascade, which leads to 
the activation of transcription factors within the nucleus. The novel protein seems to 
be the human orthologue of rat frabin. 

The new protein can find application in modulating of cell structure and motility as 
well as modulation of the JNK/SAPK pathway. 
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htes3_72pl6: As Mem3, the novel protein is similar to yeast VPS (vacuolar protein 
sorting) 35. The null allele of VPS35 results in yeast in a differential defect in the 
sorting of vacuolar carboxypeptidase Y (CPY), proteinase A (PrA), proteinase B 
(PrB), and alkaline phosphatase (ALP). The new protein can find application in 
modulation the sorting of proteins into different compartments. 

htes3_7b22: The novel protein is related to paramyosin, a major structural component 
of thick filaments and invertebrate muscle. Paramyosins are promising antigens for 
immunization against several parasites, such as Schistosoma mansoni. The new 
protein can find application in modulating cell adhesion/motility and membrane/cyto 
skeleton structure and dynamic. 

htes3J7j3: The new protein is closely related to C-Takl and therefore should be 
involved in cell-cycle regulation, too. The new protein can find application in 
modulating/blocking the cell cycle. 

htes3_7p9: The nuclear domain (ND)10 also described as POD or Kr bodies is 
involved in the development of acute promyelocyte leukemia and virus-host 
interactions. The NDP52 protein is part of this complex structure. In vivo, NDP52 is 
transcribed in all human tissues, but is redistributed upon viral infection and interferon 
treatment. ND10 plays an important role in the viral life cycle. The novel protein is 
similar to NDP52. It contains three leucine zippers and a RGD cell attachment site. 
This protein seems to be a novel part of the ND819) complex. The new protein can 
find application in modulation of viral infections and tumour events. 

htes3_8ml0: The poly(A)-binding protein (PABP) binds to the messenger (mRNA) 
3 ? -poly(A) tail found on most eukaryotic mRNAs and together with the poly(A) tail 
has been implicated in governing the stability and the translation of mRNA. The new 
protein can find application in modulation of mRNA translation and 
processing/stability. 

Kidney 

hfkd2_24bl 5 : The new protein can find application in modulation of hexose 
metabolism pathways and as a new enzyme for biotechnologic production processes. 
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hfkd2_24n20: The new protein seems to be part of the signalling pathway between 
tyrosine kinases and the membrane/cyto skeleton. The new protein can find 
application in modulating cell adhesion/motility and membrane/cyto skeleton 
structure and dynamics. 

hfkd2_3ol7: The new protein can find application in modulation of the respiratory 
electron transport chain pathways of mitochondria. 

hfkd2_46j20: The new protein can find application in modulating the 
homoprotocatechuate degradative pathway and as a enzyme for biotechnologic 
production processes. 

hfkd2_46kl9: The new protein can find application in modulating/blocking the 
expression of genes controlled by the hepatocyte nuclear factor- 1. 

hfkd2_46m4: SARI proteins are involved in vesicular transport between the 
endoplasmic reticulum and the Golgi apparatus. 

hfkd2_46kl4: rab6 is a ubiquitous ras-like GTPase involved in intra-Golgi transport. 
The new protein can find application in modulating the transport of vesicles inside the 
Golgi apparatus. 

Uterus Associated: 

hutel_18il9: The SREBP-2 protein is embedded in the membranes of the nucleus and 
endoplasmic reticulum. In cholesterol-depleted cells the proteins are cleaved to release 
soluble NH2-terminal fragments that enter the nucleus and activate genes encoding 
the low density lipoprotein receptor and enzymes of cholesterol synthesis. The new 
protein is a putative transcription factor capable of protein-protein interaction via a 
lim domain and additionally shows similarity to the common sunflower transcription 
factor SF3. 

huteM 811 : The novel protein is similar to several 40S ribosomal proteins and 
therefore seems to part of the corresponding ribosome sub-unit. 

hutel_19g22: The new protein can find application in modulation of tissue- 
calcification, especially the uterus. 
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huteM9hl7: The new protein can find application in modulating the response of 
cells to oxysterols. 

hutel_20bl9: The novel protein seems to be a novel enzyme with sarcosine oxidase 
activity. The new protein can find application in modulation of sarcosine metabolism 
and as a new enzyme for biotechnologic production processes. 

hutel_20g21: The novel protein seems to be a new ras inhibitor protein. The new 
protein can find application in modulating/blocking ras dependent signal transduction 
pathways. 

hutel_20hl 3 : The novel protein is a new human alpha-adaptin. The new protein can 
find application in modulating endocytosis and vesicle trafficking in cells. 

hutel_20ml 1 : The new protein can find application in modulating/blocking the 
activity of protein phosphatase- 1 and in modulating the cell cycle. 

hutel_20m24: This protein is a putative mannosyl transferase that is involved in the 
assembly of the core oligosaccharide Glc3Man9GlcNAc2. The new protein can find 
application in modulation of glycosylation of proteins and as a new enzyme for 
biotechnologic production processes. 

hutel_22el2: The new protein can find application in modulating the cornichon 
modulated signal transduction way and also the EGF receptor signaling processes. 

hutel_23el3: The novel protein contains a serine protease of the subtilase family with 
an aspartic acid-containing active site. The new protein can find application in 
modulation of proteinase activity in cells and as a new enzyme for proteomics and 
biotechnologic production processes. 

hutel_24j6: The new protein can find application in modulation of cell-cell-adhesion. 

hutel_24h3: The new protein can find application as a useful marker for chondro- 
osteogenic cell differentiation and for the modulation of chondro-osteogenic cell 
differentiation. 

Fetal Brain: 

hfbr2_16cl6: The new protein can find application in modulating/blocking of cyto 
skeleton-membrane protein interaction. 
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hfbr2_23b21 : The new protein can find application in modulating/blocking the 
guanylate cyclase-pathway. 

hfbr2_23bl0: The new protein can find application in modulation of splicing. 

hfbr2_2b5: The novel protein contains the typical (xxG)n repeat of collagen proteins 
and a Pfam von Willebrand factor type A domain. Therefore, the protein seems to be a 
new collagen alpha chain. The new protein can find application in modulation of 
connective tissue, bone and cartilage development and maintainance. 

hfbr2_2cl7: The new protein can find application in modulating/blocking G-protein- 
dependent pathways. 

hfbr2_2dl5: The new protein can find application in modulating early 
spermatogenesis. 

hfbr2_2il7: The new protein can find clinical application in modulating the transport 
of glycoproteins inside cells, especially of the LDL receptor. 

hfbr2_2kl4: Tumour-suppressor genes are known to be involved in the control of cell 
growth and division, interacting with proteins which control the cell cycle. The N33 
gene is significantly methylated in tumour cells, a mechanism by which tumor- 
suppressor genes are inactivated in cancer. In addition, the novel protein contains a 
RGD cell attachment site. Therefore the novel protein is a new putative tumour- 
suppressor gene. 

hfbr_3cl8: RNA helicases comprise a large family of proteins that are involved in 
basic biological systems such as nuclear and mitochondrial splicing processes, RNA 
editing, rRNA processing, translation initiation, nuclear mRNA export, and mRNA 
degradation. RNA helicases are essential factors in cell development and 
differentiation, and some of them play a role in transcription and replication of viral 
single-stranded RNA genomes. The members of the largest subgroup, the DEAD and 
DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. The novel protein contains a DEAD-box and is a new member of this 
subgroup. 

hfbr_3g8: The new protein can find application modulating NAT assembly and action 
and therefore be important in metabolism of drugs and environmental mutagens. 
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hfbr2_62bl 1 : The rac small GTPase is associated with type-I phosphatidylinositol 4- 
phosphate 5-kinase and regulating the production of phosphatidylinositol 4,5- 
bisphosphate. The new protein is expected to activate p21rac-related small GTPases. 

hfbr2_62ol7: The new protein can find application in modulation of cholesterol 
binding and transport by LDL-receptors and LDL-binding proteins. 

hfbr_6b24: The new protein can find application in modulation of rhamnose 
metabolism and as a new enzyme for biotechnologic production processes. 

hfbr_72bl 8: The new protein can find application in modulating DNA repair and 
mutagenesis. 

hfbr_78c4: The new protein can find application in modulating/blocking the response 
of cells to interferons. 

hfbr_78k24: These enzymes are involved in the processing of poly-ubiquitin 
precursors as well as that of ubiquinated proteins. The new protein can find 
application in modulation of protein stability/degradation in cells. 

hfbr_82e4: The new protein can find clinical application in modulating/blocking 
calmodulin-mediated pathways in human neuronal cells. 

VARIANTS OF THE INVENTIVE DNA MOLECULES 
Variants in General 

"Variants," according to the invention, include DNA and/or protein molecules that 
resemble, structurally and/or functionally, those set forth in herein. Variants may be isolated 
from natural sources ("homologs"), may be entirely synthetic or may be based in part on both 
natural and synthetic approaches. 

The section set forth below presents various structural and functional characteristics of 
molecules within the invention. Preferred molecules are characterized by a combination of 
one or more of these characteristics. For instance, some preferred molecules are described 
with reference to at least two structural characteristics, while others may be described with 
reference to at least one structural and at least one functional characteristic. 

It will be recognized by the skilled artisan that structure ultimately defines function, 
i.e. the functions of the molecules described herein derives from the structures of those 
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molecules. Accordingly, the structural variants described below that bear the closest 
structural relationship (as variously defined below) to the inventive molecules are the variants 
that most likely will preserve biological function. This relationship between structure and 
function will guide the skilled artisan in identifying the preferred embodiments of the 
invention. 

Splicing Variants 

It is well-known that eukaryotic structural genes are comprised of both protein coding 
and non-coding portions. When the messenger RNA is transcribed from the DNA template, 
it contains introns, which are non-coding, and exons, which are coding. In order to form a 
translation competent mRNA, the introns must be "spliced" out of this initial pre mRNA. 

Specific sequences within the pre mRNA represent "splice junctions" that direct the 
cellular splicing machinery to the appropriate position. The splice junctions are loosely 
conserved sequence regions of the pre mRNA, which almost invariably begin with GT and 
end with AG (DNA perspective). The 5' end of the splice junction typically contains about 
nine somewhat conserved residues, for example, C/AAGTA/GAGT. The 3' end usually 
contains a pyrimidirie rich stretch of at least about 11 nucleotides, followed by NC/TAGG. 
Splicing occurs before the GT and after the AG. Mount, Nucleic Acids Res. 10:459-72 
(1982). 

Interestingly, exons often correspond to discrete functional domains of the protein 
product. The intron/exon arrangement thus creates a linear array of nucleotides which can be 
correlated to discrete, and often interchangeable, functional protein fragments. Go, Nature 
291:90-92 (1981); Branden et ai 9 EMBO /. 3:1307-10 (1984). This linear arrangement 
creates the possibility of generating multiple different full length proteins by rearranging the 
order of the different functional portions in the array. For example, if a set of exons are 
arranged 1-2-3-4, where (-) represents the introns separating the exons, a splicing event need 
not simply produce 1234, but may produce 123, 134, 124 and so on. Production of different 
mRNA products in this way is commonly called "alternative splicing. " Andreadisef al , Ann. 
Rev. Cell Biol 3:207^2(1987). 

Some of the present DNA molecules can be represented in modular fashion in terms of 
their coding regions. Essentially, these modules are exons (though each "exon" may in fact 
be made up of several exons), which may be combined in different ways to form a variety of 
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different DNA molecules, each encoding a different functional protein, 
indicated below. 
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Splicing variants are 



Degenerate Variants 

One aspect of the present invention provides "degenerate variants" of the nucleic acid 
fragments of the present invention. A "degenerate variant" is a nucleotide fragment which 
differs from those of inventive molecules by nucleotide sequence, but due to the degeneracy 
of the genetic code, encodes an identical polypeptide sequence. 

Given the known relationship between DNA sequences and the proteins they encode, 
degenerate variants typically are described by reference to this relationship. It is well known 
that the degeneracy of the genetic code results in many possible DNA sequences which 
encode a particular protein. Indeed, of the three bases which comprise an amino acid- 
encoding triplet, the third position, and often the second, almost always may vary. This fact 
alone allows for a class of variant DNA molecules which encode protein sequences identical 
to those disclosed herein, yet have about 30% sequence variation. In other words, the variant 
DNA molecules are about 70% identical to the inventive DNAs, having no additional or 
deleted sequences. Thus, one aspect of the invention provides degenerate variant DNA 
molecules encoding the inventive protein sequences. 

In one embodiment, these variants have at least about 70% sequence identity with the 
DNA molecules described herein. In a preferred embodiment, these variants have at least 
about 80% sequence identity to the inventive molecules. In a more preferred embodiment 
these variants have at least about 90% sequence identity with the inventive molecules. 

Conservative Amino Acid Variants 

Variants according to the invention also may be made that conserve the overall 
molecular structure of the encoded proteins. Given the properties of the individual amino 
acids comprising the disclosed protein products, some rational substitutions will be recognized 
by the skilled worker. Amino acid substitutions, i.e. "conservative substitutions," may be 
made, for instance, on the basis of similarity in polarity, charge, solubility, hydrophobicity, 
hydrophilicity, and/or the amphipathic nature of the residues involved. 

For example: (a) nonpolar (hydrophobic) amino acids include alanine, leucine, 
isoleucine, valine, proline, phenylalanine, tryptophan, and methionine; (b) polar neutral 
amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and glutamine; 
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(c) positively charged (basic) amino acids include arginine, lysine, and histidine; and (d) 
negatively charged (acidic) amino acids include aspartic acid and glutamic acid. Substitutions 
typically may be made within groups (a)-(d). In addition, glycine and proline may be 
substituted for one another based on their ability to disrupt ct-helices. Similarly, certain 
amino acids, such as alanine, cysteine, leucine, methionine, glutamic acid, glutamine, 
histidine and lysine are more commonly found in cc-helices, while valine, isoleucine, 
phenylalanine, tyrosine, tryptophan and threonine are more commonly found in P-pleated 
sheets. Glycine, serine, aspartic acid, asparagine, and proline are commonly found in turns. 
Some preferred substitutions may be made among the following groups: (i) S and T; (ii) P and 
G; and (iii) A, V, L and I. Given the known genetic code, and recombinant and synthetic 
DNA techniques, the skilled scientist readily can construct DNAs encoding the conservative 
amino acid variants. 

As used herein, "sequence identity" between two polypeptide sequences indicates the 
percentage of amino acids that are identical between the sequences. "Sequence similarity" 
indicates the percentage of amino acids that either are identical or that represent conservative 
amino acid substitutions. 

Functionally Equivalent Variants 

Yet another class of DNA variants within the scope of the invention may be described 
with reference to the product they encode. As shown below, some of the inventive DNA 
molecules encode a protein having a degree of homology with known proteins, or protein 
domains. It is expected, therefore, that they will have some or all of the requisite functional 
features of such molecules. These "functionally equivalent variants" products are 
characterized by the fact that they are functionally equivalent, with respect to biological 
activity, to certain known molecules. 

The instant invention provides information on common structural motifs, including 
consensus sequences that will guide the artisan in constructing functionally equivalent 
variants. It will be understood that the motifs, identified for each inventive protein, may be 
modified within the identified consensus sequences. Thus, the invention contemplates the 
proteins disclosed herein that contain variability in the consensus sequences identified, and the 
invention further contemplates the full range of nucleic acids encoding them, and the 
complements of those nucleic acids. 
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Hybridizing Variants 

DNA variants within the invention also may be described by reference to their 
physical properties in hybridization. One skilled in the field will recognize that DNA can be 
used to identify its complement and, since DNA is double stranded, its equivalent or 
homolog, using nucleic acid hybridization techniques. It will also be recognized that 
hybridization can occur with less than 100% complementarity. However, given appropriate 
choice of conditions, hybridization techniques can be used to differentiate among DNA 
sequences based on their structural relatedness to a particular probe. For guidance regarding 
such conditions see, for example, Sambrook et al, 1989, MOLECULAR CLONING, A 
LABORATORY MANUAL, Cold Spring Harbor Press, N.Y.; and Ausubel et aL, 1989, 
CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, Green Publishing Associates and 
Wiley Interscience, N. Y. 

Structural relatedness between two polynucleotide sequences can be expressed as a 
function of "stringency" of the conditions under which the two sequences will hybridize with 
one another. As used herein, the term "stringency" refers to the extent that the conditions 
disfavor hybridization. Stringent conditions strongly disfavor hybridization, and only the 
most structurally related molecules will hybridize to one another under such conditions. 
Conversely, non-stringent conditions favor hybridization of molecules displaying a lesser 
degree of structural relatedness. Hybridization stringency, therefore, directly correlates with 
the structural relationships of two nucleic acid sequences. The following relationships are 
useful in correlating hybridization and relatedness (where T m is the melting temperature of a 
nucleic acid duplex): 

a. T m = 69.3 + 0.41(G+C)% 

b. The T m of a duplex DNA decreases by 1°C with every increase of 1 % in the 
number of mismatched base pairs. 

c (TJ^ - (TJ,, = 18.5 Iog 10 n2/Ml 

where \xl and \xl are the ionic strengths of two solutions. 

Hybridization stringency is a function of many factors, including overall DNA 
concentration, ionic strength, temperature, probe size and the presence of agents which 
disrupt hydrogen bonding. Factors promoting hybridization include high DNA 
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concentrations, high ionic strengths, low temperatures, longer probe size and the absence of 
agents that disrupt hydrogen bonding. 

Hybridization usually is done in two stages. First, in the "binding" stage, the probe is 
bound to the target under conditions favoring hybridization. Stringency is usually controlled 
at this stage by altering the temperature. For high stringency, the temperature is usually 
between 65°C and 70°C, unless short (<20 nt) oligonucleotide probes are used. A 
representative hybridization solution comprises 6X SSC, 0.5% SDS, 5X Denhardt's solution 
and lOOng of non-specific carrier DNA. See Ausubel et aL, supra, section 2.9, supplement 
27 (1994). Of course many different, yet functionally equivalent, buffer conditions are 
known. Where the degree of relatedness is lower, a lower temperature may be chosen. Low 
stringency binding temperatures are between about 25°C and 4(fC. Medium stringency is 
between at least about 40PC to less than about 65°C. High stringency is at least about 65*0. 

Second, the excess probe is removed by washing. It is at this stage that more stringent 
conditions usually are applied. Hence, it is this "washing" stage that is most important in 
determining relatedness via hybridization. Washing solutions typically contain lower salt 
concentrations. One exemplary medium stringency solution contains 2X SSC and 0.1% SDS. 
A high stringency wash solution contains the equivalent (in ionic strength) of less than about 
0.2X SSC, with a preferred stringent solution containing about 0. IX SSC. The temperatures 
associated with various stringencies are the same as discussed above for "binding." The 
washing solution also typically is replaced a number of times during washing. For example, 
typical high stringency washing conditions comprise washing twice for 30 minutes at 55° C. 
and three times for 15 minutes at 60° C. 

The present invention includes nucleic acid molecules that hybridize to the inventive 
molecules under high stringency binding and washing conditions. More preferred molecules 
(from an mRNA perspective) are those that are at least 50 % of the length of any one of those 
depicted in below. Particularly preferred molecules are at least 75 % of the length of those 
molecules. 

Substitutions, Insertions, Additions and Deletions 

In a general sense, the preferred DNA variants of the invention are those that retain 
the closest relationship, as described by "sequence identity" to the inventive DNA molecules. 
According to another aspect of the invention, therefore, substitutions, insertions, additions 
and deletions of defined properties are contemplated. It will be recognized that sequence 
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identity between two polynucleotide sequences, as defined herein, generally is determined 
with reference to the protein coding region of the sequences. Thus, this definition does not at 
all limit the amount of DNA, such as vector DNA, that may be attached to the molecules 
described herein. Preferred DNA sequence variants include molecules encoding proteins 
sharing some or all of any relevant biological activity of the native molecule. 

In creating these variants, the skilled worker will be guided by reference to the protein 
structure. First, insertions and deletions in any recognized functional domain, above, 
generally should be avoided, except as noted below in the section entitled "Proteins," where 
this domain is discussed in detail. Alterations in such domains usually will be limited to 
conservative amino acid substitutions. In addition, where insertions and deletions are desired, 
this may be accomplished at the N- and/or C-terminus of the protein molecule (or the 
corresponding coding regions of the DNA). If insertions or deletions are made within the 
protein, deletions of major structural features usually should be avoided. Thus, a preferred 
place to make insertion or deletion variants is in non-structural regions, such as linker regions 
between two alpha helices. 

"Substitutions" generally refer to alterations in the DNA sequence which do not 
change its overall length, but only alter one or more nucleotide positions, substituting one for 
another in the common sense of the word. One class of preferred substitutions, "degenerate 
substitutions, " are those that do not alter the encoded amino acid sequence. Some subsitutions 
retains 50%, 55%, 60% or 65% identity. Preferred substitutions retain at least about 70% 
identity, more preferably at least 70% or 75% identity, with the inventive DNAs. Some more 
preferred molecules have at least about 80% identity, more preferably at least 80% or 85% 
identity. Particularly preferred DNAs share at least about 90% identity, more preferably at 
least 90% or 95% identity. 

"Insertions," unlike substitutions, alter the overall length of the DNA molecule, and 
thus sometimes the encoded protein. Insertions add extra nucleotides to the interior (not the 
5' or 3' ends) of the subject DNAs. Preferred insertions are made with reference to the 
protein sequence encoded by the DNA. Thus, it is most preferred to provide an insertion in 
the DNA at a location that corresponds to an area of the encoded protein which lacks 
structure. For instance, it typically would not be beneficial, if the preservation of biological 
activity is desired, to provide an insertion within an alpha-helical region or a beta-pleated 
sheet. Accordingly, non-structural areas, such as those containing helix-breaking glycines 
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and proline residues, are most preferred sites of insertion. Other preferred sites of insertion 
are the splice sites, which are indicated above in the description of the inventive DNA 
molecules. 

While the optimal size of insertions will vary depending upon the site of insertion and 
its effect on the overall conformation of the encoded protein, some general guides are useful. 
Generally, the total insertions (irrespective of their number) should not add more than about 
30% (or preferably not more than 30%) to the overall size of the encoded protein. More 
preferably, the insertion adds less than about 10-20% (yet more preferably 10-20%) in size, 
with less than about 10% being most preferred. The number of insertions is limited only by 
the number of suitable insertions sites, and secondarily by the foregoing size preferences. 

"Additions," like insertions, also add to the overall size of the DNA molecule, and 
usually the encoded protein. However, instead of being made within the molecule, they are 
made on the 5 f or 3* end, usually corresponding to the N- or C- terminus of the encoded 
protein. Unlike deletions, additions are not very size-dependent. Indeed, additions may be of 
virtually any size. Preferred additions, however, do not exceed about 100% of the size of the 
native molecule. More preferably, they add less than about 60 to 30% to the overall size, 
with less than about 30% being most preferred. 

"Deletions" diminish the overall size of the DNA and, therefore, also reduce the size 
of the protein encoded by that DNA. Deletions may be made from either end of the molecule 
or internal to it. Typical preferred deletions remove discrete structural features of the 
encoded protein. For example, some deletions will comprise the deletion of one or more 
exons which may define a structural feature. Preferred deletions remove less than about 30% 
of the size of the subject molecule. More preferred deletions remove less than about 20% and 
most preferred deletions remove less than about 10% . 

Computer-Defined Variants and Definition of "Sequence Identity " 

In general, both the DNA and protein molecules of the invention can be defined with 
reference to "sequence identity." As used herein, "sequence identity" refers to a comparison 
made between two molecules using, for example, the standard Smith- Waterman algorithm 
that is well known in the art. 

Some molecules have at lease about 50%, 55% or 60% identity. Preferred molecules 
are those having at least about 65% sequence identity, more preferably at least 65% or 70% 
sequence identity. Other preferred molecules have at least about 80%, more preferably at 
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least 80% or 85%, sequence identity. Particularly preferred molecules have at least about 
90% sequence identity, more preferably at least 90% sequence identity. Most preferred 
molecules have at least about 95%, more preferably at least 95%, sequence identity. As used 
herein, two nucleic acid molecules or proteins are said to "share significant sequence identity" 
if the two contain regions which possess greater than 85% sequence (amino acid or nucleic 
acid) identity. 

"Sequence identity" is defined herein with reference the Blast 2 algorithm, which is 
available at the NCBI (http://www.ncbi.nlm.nih.gov/BLAST), using default parameters. 
References pertaining to this algorithm include: those found at 

http://www.ncbi.nlm.nih.gov/BLAST/blast_references.html; Altschul, S.F., Gish, W., Miller, 
W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 
215:403-410; Gish, W. & States, D.J. (1993) "Identification of protein coding regions by 
database similarity search." Nature Genet. 3:266-272; Madden, T.L., Tatusov, R.L. & Zhang, 
J. (1996) "Applications of network BLAST server" Meth. Enzymol. 266:131-141; Altschul, 
S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997) 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." 
Nucleic Acids Res. 25:3389-3402; and Zhang, J. & Madden, T.L. (1997) "PowerBLAST: A 
new network BLAST application for interactive or automated sequence analysis and 
annotation." Genome Res. 7:649-656. 

METHODS OF MAKING VARIANTS 

It will be recognized that variants of the inventive molecules can be constructed in 
several different ways. For example, they may be constructed as completely synthetic DNAs. 
Methods of efficiently synthesizing oligonucleotides in the range of 20 to about 150 
nucleotides are widely available. See Ausubel et a/., supra, section 2.11, Supplement 21 
(1993). Overlapping oligonucleotides may be synthesized and assembled in a fashion first 
reported by Khorana et al y J. Mol. Biol. 72:209-217 (1971); see also Ausubel et al y Section 
8.2. The synthetic DNAs are designed with convenient restriction sites engineered at the 5 1 
and 3' ends of the gene to facilitate cloning into an appropriate vector. 

An alternative method of generating variants is to start with one of the inventive 
DNAs and then to conduct site-directed mutagenesis. See Ausubel et al, supra, chapter 8, 
Supplement 37 (1997). In a typical method, a target DNA is cloned into a single-stranded 



90 



WO 01/12659 PCT/IB00/01496 

DNA bacteriophage vehicle. Single-stranded DNA is isolated and hybridized with a 
oligonucleotide containing the desired nucleotide alteration(s). The complementary strand is 
synthesized and the double stranded phage is introduced into a host. Some of the resulting 
progeny will contain the desired mutant, which can be confirmed using DNA sequencing. In 
addition, various methods are available that increase the probability that the progeny phage 
will be the desired mutant. These methods are well known to those in the field and kits are 
commercially available for generating such mutants. 

ISOLATING HOMOLOGS 

Methods 

By using the sequences disclosed herein as probes or as primers, and techniques such 
as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain homologs. 
"Homologs" are essentially naturally-occurring variants and include allelic, species-specific 
and tissue-specific variants. 

Region-specific primers or probes derived from the nucleotide sequence(s) provided 
can be used to prime DNA synthesis and PCR amplification, as well as to identify colonies 
containing cloned DNA encoding a homolog using known methods (Innis et aL, PCR 
Protocols, Academic Press, San Diego, CA (1990)). Such an application is useful in 
diagnostic methods, as described in more detail below, as well as in preparing full-length 
DNAs from various sources. The PCR primers are preferably at least 15 bases, and more 
preferably at least 18 bases in length. When selecting a primer sequence, it is preferred that 
the primer pairs have approximately the same G/C ratio, so that melting temperatures are 
approximately the same. As a general guide, the formula 3(G+C) + 2(A+T) = °C, is 
useful. 

When using primers derived from the inventive sequences, one skilled in the art will 
recognize that by employing high stringency conditions (e.g., annealing at 50-60°C), only 
sequences with greater than 75% sequence identity to the primer will be amplified. By 
employing lower stringency conditions (e.g., annealing at 35-37°C), sequences which have 
greater than 40-50% sequence identity to the primer also will be amplified. 

The PCR product may be subcloned and sequenced to confirm that it indeed displays 
the expected sequence identity. The PCR fragment may then be used to isolate a full length 
cDNA clone by a variety of methods. For example, the amplified fragment may be labeled 
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and used to screen a bacteriophage cDNA library. Alternatively, the labeled fragment may be 

used to screen a genomic library. 

PCR technology may also be utilized to isolate full length cDNA sequences. For 
example, RNA may be isolated, following standard procedures, from an appropriate cellular 
or tissue source. A reverse transcription reaction may be performed on the RNA using an 
oligonucleotide primer specific for the most 5' end of the amplified fragment for the priming 
of first strand synthesis. The resulting RNA/DNA hybrid may then be "tailed" with guanines 
using a standard terminal transferase reaction, the hybrid may be digested with RNAase H, 
and second strand synthesis may then be primed with a poly-C primer. Thus, cDNA 
sequences upstream of the amplified fragment may easily be isolated. For a review of cloning 
strategies which may be used, see e.g., Sambrooket al., 1989, supra. 

When using DNA probes derived from the inventive sequences for colony/plaque 
hybridization, one skilled in the art will recognize that by employing medium to high 
stringency conditions (e.g., hybridizing at 50-65°C in 5X SSPC and 50% formamide, and 
washing at 50-65°C in 0.5X SSPC), sequences having regions with greater than 90% 
sequence identity to the probe can be obtained, and that by employing lower stringency 
conditions (e.g., hybridizing at 35-37°C in 5X SSPC and 40-45% formamide, and washing at 
42°C in SSPC), sequences having regions with greater than 35-45% sequence identity to the 
probe will be obtained. 

Suitably, genomic or cDNA libraries can be constructed and screened in accord with 
the previous paragraph. The libraries should be derived from a tissue or organism that is 
known to express the gene of interest, or that is suspected of expressing the gene. The clone 
containing the homolog may then be purified through methods routinely practiced in the art, 
and subjected to sequence analysis. 

Additionally, an expression library can be constructed utilizing DNA isolated from or 
cDNA synthesized from a tissue or organism that is known to express the gene of interest, or 
that is suspected of expressing the gene. In this manner, clones may be induced and screened 
using standard antibody screening techniques in conjunction with antibodies raised against the 
normal gene product, as described herein. (For screening techniques, see, for example, 
Harlow, E. and Lane, eds., 1988, ANTIBODIES: A LABORATORY MANUAL, Cold 
Spring Harbor Press, Cold Spring Harbor Press.) 

Human Homologs 
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Any organism or tissue can be used as the source for homologs of the present 
invention so long as the organism or tissue naturally expresses such a protein or contains 
genes encoding the same. The most preferred organism for isolating homologs is human. 

PROTEINS OF THE INVENTION 

One class of proteins included within the invention is encoded by the inventive DNA 
molecules presented. Other proteins according to the invention are those encoded by the 
DNA variants described above. As noted, these variants are designed with the encoded 
proteins in mind. 

A preferred class of protein fragments includes those fragments which retain any 
biological activity. These molecules share fiinctional features common the family of proteins, 
although these characteristics may vary in degree. 

According to one aspect of the invention fragments of the inventive proteins are 
contemplated. Some preferred fragments are those which are capable of eliciting an immune 
response. Generally these "antigenic" fragments will be from about five amino acids in 
length to about fifty amino acids in length. Some preferred antigenic fragments are from five 
to about twenty amino acids long. "Antigenic" response may refer to a T cell response, a B 
cell response or a response by cells of the macrophage/monocyte lineages. In most cases, 
however, it will refer to the immune response involved in the generation of antibodies. In 
other words, the relevant immune response is that of helper T cells and/or B cells. These 
preferred molecules comprise one or more T cell and /or B cell epitopes. 

ANTIBODIES OF THE INVENTION 

Antibodies raised against the proteins and protein fragments of the invention also are 
contemplated by the invention. Described below are antibody products and methods for 
producing antibodies capable of specifically recognizing one or more epitopes of the presently 
described proteins and their derivatives. 

Antibodies include, but are not limited to polyclonal antibodies, monoclonal antibodies 
(mAbs), humanized or chimeric antibodies, single chain antibodies including single chain Fv 
(scFv) fragments, Fab fragments, Ffcb'X fragments, fragments produced by a Fab expression 
library, anti-idioty pic (anti-Id) antibodies, epitope-binding fragments, and humanized forms of 
any of the above. 
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As known to one in the art, these antibodies may be used, for example, in the 
detection of a target protein in a biological sample. They also may be utilized as part of 
treatment methods, and/or may be used as part of diagnostic techniques whereby patients may 
be tested for abnormal levels or for the presence of abnormal forms of the such proteins. 

In general, techniques for preparing polyclonal and monoclonal antibodies as well as 
hybridomas capable of producing the desired antibody are well known in the art (Campbell, 
A.M., Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and 
Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984); St. 
Groth et al., /. Immunol. Methods 35:1-21 (1980); Kohler and Milstein, Water* 256:495-497 
(1975)), the trioma technique, the human B-cell hybridoma technique (Kozbor et aL, 
Immunology Today 4:72 (1983); Cole et aL, in Monoclonal Antibodies and Cancer Therapy, 
Alan R. Liss, Inc. (1985), pp. 77-96). Antibodies may also be generated by the known 
techniques of phage display and in vitro immunization. 

Polyclonal Antibodies 

Polyclonal antibodies are heterogeneous populations of antibody molecules derived 
from the sera of animals immunized with an antigen, such as an inventive protein or an 
antigenic derivative thereof. 

Polyclonal antiserum, containing antibodies to heterogeneous epitopes of a single 
protein, can be prepared by immunizing suitable animals with the expressed protein described 
above, which can be unmodified or modified, as known in the art, to enhance 
immunogenicity . Immunization methods include subcutaneous or intraperitoneal injection of 
the polypeptide. 

Effective polyclonal antibody production is affected by many factors related both to 
the antigen and to the host species. For example, small molecules tend to be less 
immunogenic than other and may require the use of carriers and/or adjuvant. In addition, 
host animal response may vary with site of inoculation. Both inadequate or excessive doses 
of antigen may result in low titer antisera. In general, however, small doses (high ng to low 
jig levels) of antigen administered at multiple intradermal sites appears to be most reliable. 
Host animals may include but are not limited to rabbits, mice, chickens and rats, to name but 
a few. An effective immunization protocol for rabbits can be found in Vaitukaitis, J. et aL, 7. 
Clin. EndocrinoL Metab. 33:988-991 (1971). 
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The protein immunogen may be modified or administered in an adjuvant in order to 
increase the protein's antigenicity. Methods of increasing the antigenicity of a protein are 
well known in the art and include, but are not limited to coupling the antigen with a 
heterologous protein (such as globulin p-galactosidase) or through the inclusion of an adjuvant 
during immunization. Adjuvants include Freund's (complete and incomplete), mineral gels 
such as aluminum hydroxide, surface active substances such as lysolecithin, pluronic polyols, 
polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, dinitrophenol, and 
potentially useful human adjuvants such as BCG (bacille Calmette-Guerin) and 
Corynebacterium parvum. 

Booster injections can be given at tegular intervals, with at least one usually being 
required for optimal antibody production. The antiserum may be harvested when the 
antibody titer begins to fall. Titer may be determined semi-quant itatively, for example, by 
double immunodiffusion in agar against known concentrations of the antigen. See, for 
example, Ouchterlony et aL, Chap. 19 in: Handbook of Experimental Immunology, Wier, ed, 
Blackwell (1973). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 
mg/ml of serum (about 12 ^M). The antiserum may be purified by affinity chromatography 
using the immobilized immunogen carried on a solid support. Such methods of affinity 
chromatography are well known in the art. 

Affinity of the antisera for the antigen may be determined by preparing competitive 
binding curves, as described, for example, by Fisher, Chap. 42 in: Manual of Clinical 
Immunology, second edition, Rose and Friedman, eds., Amer. Soc. For Microbiology, 
Washington, D.C. (1980). 

In addition to using protein an the immunogen, DNA molecules may be used directly. 
In this manner, a DNA encoding the protein immunogen is administered. Boosting and 
harvesting is done in a manner analogous to that detailed above. Yet another method of 
producing antibodies entails immunizing chickens and harvesting the antibodies from their 
eggs. 

Monoclonal Antibodies 

Monoclonal antibodies (MAbs), are homogeneous populations of antibodies to a 
particular antigen. They may be obtained by any technique which provides for the production 
of antibody molecules by continuous cell lines in culture or in vivo. MAbs may be produced 
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by making hybridomas which are immortalized cells capable of secreting a specific 
monoclonal antibody. 

Monoclonal antibodies to any of the proteins, peptides and epitopes thereof described 
herein can be prepared from murine hybridomas according to the classical method of Kohler, 
G. and Milstein, C, Nature 256:495-497 (1975) (and U.S. Patent No. 4,376,110) or 
modifications of the methods thereof, such as the human B-cell hybridoma technique (Kosbor 
et al. f 1983, Immunology Today 4:72; Cole et al y 1983, Proc. Natl Acad. Sci. USA 80: 
2026-2030), and the EBV-hybridoma technique (Cole et al. 7 1985, MONOCLONAL 
ANTIBODIES AND CANCER THERAPY, Alan R. Liss, Inc., pp. 77-96). 

In one method a mouse is repetitively inoculated with a few micrograms of the 
selected protein over a period of a few weeks. The mouse is then sacrificed, and the antibody 
producing cells of the spleen are isolated. 

The spleen cells are fused, typically using polyethylene glycol, with mouse myeloma 
cells, such as SP2/0-Agl4 myeloma cells. The excess, unfused cells are destroyed by growth 
of the system on selective media comprising aminopterin (HAT media). The successfully 
fused cells are diluted, and aliquots are plated to microliter plates where growth is continued. 

Antibody-producing clones (hybridomas) are identified by detection of antibody in the 
supernatant fluid of the wells by immunoassay procedures. These include ELISA, as 
originally described by Engvall, Meth. EnzymoL 70:419 (1980), western blot analysis, 
radioimmunoassay (Lutz et al, Exp. Cell Res. 175:109-124 (1988)) and modified methods 
thereof. 

Selected positive clones can be expanded and their monoclonal antibody product 
harvested for use. Detailed procedures for monoclonal antibody production are described in 
Davis, L. et al BASIC METHODS IN MOLECULAR BIOLOGY, Elsevier, New York. 
Section 21-2 (1989). The hybridoma clones may be cultivated!/! vitro or in vivo, for instance 
as ascites. Production of high titers of mAbs in vivo makes this the presently preferred 
method of production. Alternatively, hybridoma culture in hollow fiber bioreactors provides 
a continuous high yield source of monoclonal antibodies. 

The antibody class and subclass may be determined using procedures known in the art 
(Campbell, A.M. , Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry 
and Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984)). 
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MAbs may be of any immunoglobulin class including IgG, IgM, IgE, IgA, IgD and any 
subclass thereof. Methods of purifying monoclonal antibodies are well known in the art. 



Antibody Derivatives and Fragments 

Fragments or derivatives of antibodies include any portion of the antibody which is 
capable of binding the target antigen, or a specific portion thereof. Antibody derivatives 
include poly-specific (e.g., bi-specific) antibodies, which contain binding sites specific for two 
or more different epitopes. These epitopes may be from the same or different inventive 
molecules or one or more epitope may be from a molecule not specifically disclosed here. 

Antibody fragments specifically include F(ab% Fab, Fab* and Fv fragments. These 
can be generated from any class of antibody, but typically are made from IgG or IgM. They 
may be made by conventional recombinant DNA techniques or, using the classical method, by 
proteolytic digestion with papain or pepsin. See CURRENT PROTOCOLS IN 
IMMUNOLOGY, chapter 2, Coligan et al , eds. , (John Wiley & Sons 1991-92). 

F(ab') 2 fragments are typically about 110 kDa (IgG) or about 150 kDa (IgM) and 
contain two antigen-binding regions, joined at the hinge by disulfide bond(s). Virtually all, if 
not all, of the Fc is absent in these fragments. Fab' fragments are typically about 55 kDa 
(IgG) or about 75 kDa (IgM) and can be formed, for example, by reducing the disulfide 
bond(s) of an Ffab'^ fragment. The resulting free sulfhydryl group(s) may be used to 
conveniently conjugate Fab 1 fragments to other molecules, such as detection reagents (e.g., 
enzymes). 

Fab fragments are monovalent and usually are about 50 kDa (from any source). Fab 
fragments include the light (L) and heavy (H) chain, variable (V L and V H , respectively) and 
constant (C L C H , respectively) regions of the antigen-binding portion of the antibody. The H 
and L portions are linked by an intramolecular disulfide bridge. 

Fv fragments are typically about 25 kDa (regardless of source) and contain the 
variable regions of both the light and heavy chains (V L and V H , respectively). Usually, the V L 
and V H chains are held together only by non-covalent interacts and, thus, they readily 
dissociate. They do, however, have the advantage of small size and they retain the same 
binding properties of the larger Fab fragments. Accordingly, methods have been developed 
to crosslink the V L and V H chains, using, for example, glutaraldehyde (or other chemical 
crosslinkers), intermolecular disulfide bonds (by incorporation of cysteines) and peptide 
linkers. The resulting Fv is now a single chain (i.e. , SCFv). 
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Other antibody derivatives include single chain antibodies (U.S. Patent 4,946,778; 
Bird, Science 242:423-426 (1988); Huston etal, Proc. Natl. Acad. Sci. USA 85:5879-5883 
(1988); and Ward et al , Nature 334:544-546 (1989)). Single chain antibodies are formed by 
linking the heavy and light chain fragments of the Fv region via an amino acid bridge, 
resulting in a single chain FV (SCFv). 

One preferred method involves the generation of scFvs by recombinant methods, 
which allows the generation of Fvs with new specificities by mixing and matching variable 
chains from different antibody sources. In a typical method, a recombinant vector would be 
provided which comprises the appropriate regulatory elements driving expression of a cassette 
region. The cassette region would contain a DNA encoding a peptide linker, with convenient 
sites at both the 5 1 and 3' ends of the linker for generating fusion proteins. The DNA 
encoding a variable region(s) of interest may be cloned in the vector to form fusion proteins 
with the linker, thus generating an scFv. 

In an exemplary alternative approach, DNAs encoding two Fvs may be ligated to the 
DNA encoding the linker, and the resulting tripartite fusion may be ligated directly into a 
conventional expression vector. The scFv DNAs generated any of these methods may be 
expressed in prokaryotic or eukaryotic cells, depending on the vector chosen. 

Antibody fragments which recognize specific epitopes may be generated by known 
techniques. For example, such fragments include but are not limited to: the F(ab'^ fragments 
which can be produced by pepsin digestion of the antibody molecule and the Fab fragments 
which can be generated by reducing the disulfide bridges of the fragments. 
Alternatively, Fab expression libraries may be constructed (Huse et al., 1989, Science, 
246: 1275-1281) to allow rapid and easy identification of monoclonal Fab fragments with the 
desired specificity. 

Derivatives also include "chimeric antibodies" (Morrison et al, Proc, Natl Acad. 
Sri., 81:6851-6855 (1984); Neuberger et al, Nature, 312:604-608 (1984); Takeda et al, 
Nature, 314:452-454 (1985)). These chimeras are made by splicing the DNA encoding a 
mouse antibody molecule of appropriate specificity with, for instance, DNA encoding a 
human antibody molecule of appropriate specificity. Thus, a chimeric antibody is a molecule 
in which different portions are derived from different animal species, such as those having a 
variable region derived from a murine mAb and a human immunoglobulin constant region. 
These are also known sometimes as "humanized" antibodies and they offer the added 
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advantage of at least partial shielding from the human immune system, 
particularly useful in therapeutic in vivo applications. 
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They are, therefore, 



Labeled Antibodies 

The present invention further provides the above-described antibodies in detectably 
labeled form. Antibodies can be detectably labelled through the use of radioisotopes, affinity 
labels (such as biotin, avidin, etc.), enzymatic labels (such as horseradish peroxidase, alkaline 
phosphatase, etc.) fluorescent labels (such as FITC or rhodamine, etc.), paramagnetic atoms, 
etc. Procedures for accomplishing such labeling are well-known in the art, for example see 
(Sternberger el al. f 7. Histochem. Cytochem. 18:315 (1970); Bayer et aL y Meth. Enzym. 
62:308 (1979); Engval et al, Immunol. 109:129 (1972); Goding, /. Immunol. Meth. 13:215 
(1976)). The labeled antibodies of the present invention can be used form vitro, in vivo, and 
in situ diagnostic assays. 

Immobilized Antibodies 

The foregoing antibodies also may be immobilized on a solid support. Examples of 
such solid supports include plastics such as polycarbonate, complex carbohydrates such as 
agarose and sepharose, acrylic resins and such as polyacrylamide and latex beads. 
Techniques for coupling antibodies to such solid supports are well known in the art (Weiret 
aL, "Handbook of Experimental Immunology" 4th Ed., Blackwell Scientific Publications, 
Oxford, England, Chapter 10 (1986); Jacoby et al, Meth. Enzym. 34 Academic Press, N.Y. 
(1974)). The immobilized antibodies of the present invention can be used for in vitro, in vivo, 
and in situ assays as well as for immunoaffinity purification of the proteins of the present 
invention. 

THERAPEUTIC AND DIAGNOSTIC COMPOSITIONS 

The proteins, antibodies and polynucleotides of the present invention can be 
formulated according to known methods to prepare pharmaceutical^ useful compositions, 
whereby these materials, or their functional derivatives, are combined in admixture with a 
pharmaceutical^ acceptable carrier vehicle. Suitable vehicles and their formulation, inclusive 
of other human proteins, e.g., human serum albumin, are described, for example, in 
Remington 's Pharmaceutical Sciences (16th ed. , Osol, A. , Ed. , Mack, Easton PA (1980)). In 
order to form a pharmaceutical^ acceptable composition suitable for effective administration, 
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such compositions will contain an effective amount of one or more of the agents of the present 
invention, together with a suitable amount of carrier vehicle. 



Pharmaceutical compositions for use in accordance with the present invention may be 
formulated in conventional manner using one or more physiologically acceptable carriers or 
excipients. Thus, the compounds and their physiologically acceptable salts and solvate may 
be formulated for administration by inhalation or insufflation (either through the mouth or the 
nose) or oral, buccal, parenteral or rectal administration. 

For oral administration, the pharmaceutical compositions may take the form of, for 
example, tablets or capsules prepared by conventional means with pharmaceutical^ 
acceptable excipients such as binding agents (e.g., pregelatinised maize starch, 
polyvinylpyrrolidone or hydroxypropyl methylcellulose); fillers (e.g., lactose, 
microcrystalline cellulose or calcium hydrogen phosphate); lubricants (e.g., magnesium 
stearate, talc or silica); disintegrants (e.g., potato starch or sodium starch glycolate); or 
wetting agents (e.g., sodium lauryl sulphate). The tablets may be coated by methods well 
known in the art. Liquid preparations for oral administration may take the form of, for 
example, solutions, syrups or suspensions, or they maybe presented as a dry product for 
constitution with water or other suitable vehicle before use. Such liquid preparations may be 
prepared by conventional means with pharmaceutical^ acceptable additives such as 
suspending agents (e.g. , sorbitol syrup, cellulose derivatives or hydrogenated edible fats); 
emulsifying agents (e.g., lecithin or acacia); non-aqueous vehicles (e.g., almond oil, oily 
esters, ethyl alcohol or fractionated vegetable oils); and preservatives (e.g. , methyl or propyl- 
p-hydroxybenzoates or sorbic acid). The preparations may also contain buffer salts, 
flavoring, coloring and sweetening agents as appropriate. 

Preparations for oral administration may be suitably formulated to give controlled 
release of the active compound. For buccal administration the composition may take the form 
of tablets or lozenges formulated in conventional manner. 

For administration by inhalation, the compounds for use according to the present 
invention are conveniently delivered in the form of an aerosol spray presentation from 
pressurized packs or a nebuliser, with the use of a suitable propellant, e.g., 
dichlorodifluoromethane, trichlorofluoromethane, dichlorotetrafluoroethane, carbon dioxide 
or other suitable gas. In the case of a pressurized aerosol the dosage unit may be determined 
by providing a valve to deliver a metered amount. Capsules and cartridges of, e.g. gelatin for 
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use in an inhaler or insufflator may be formulated containing a powder mix of the compound 
and a suitable powder base such as lactose or starch. 

The compounds may be formulated for parenteral administration by injection, e.g., by 
bolus injection or continuous infusion. Formulations for injection may be presented in unit 
dosage form, e.g. , in ampules or in multi-dose containers, with an added preservative. The 
compositions may take such forms as suspensions, solutions or emulsions in oily or aqueous 
vehicles, and may contain formulatory agents such as suspending, stabilizing and/or 
dispersing agents. Alternatively, the active ingredient may be in powder form for constitution 
with a suitable vehicle, e.g. , sterile pyrogen-free water, before use. 

The compounds may also be formulated in rectal compositions such as suppositories 
or retention enemas, e.g. , containing conventional suppository bases such as cocoa butter or 
other glycerides. 

In addition to the formulations described previously, the compounds may also be 
formulated as a depot preparation. Such long acting formulations may be administered by 
implantation (for example subcutaneously or intramuscularly) or by intramuscular injection. 
Thus, for example, the compounds may be formulated with suitable polymeric or 
hydrophobic materials (for example as an emulsion in an acceptable oil) or ion exchange 
resins, or as sparingly soluble derivatives, for example, as a sparingly soluble salt. 

The compositions may, if desired, be presented in a pack or dispenser device which 
may contain one or more unit dosage forms containing the active ingredient. The pack may 
for example comprise metal or plastic foil, such as a blister pack. The pack or dispenser 
device may be accompanied by instructions for administration. 

RECOMBINANT CONSTRUCTS AND EXPRESSION 

The present invention further provides recombinant DNA constructs comprising one 
or more of the nucleotide sequences of the present invention. The recombinant constructs of 
the present invention comprise a vector, such as a plasmid or viral vector, into which a DNA 
or DNA fragment, typically bearing an open reading frame, is inserted, in either orientation. 

The gene products encoded by the subject DNAs may be produced by recombinant 
DNA technology using techniques well known in the art. See, for example, the techniques 
described in Sambrook et al., 1989, supra, and Ausubel et al., 1989, supra. Alternatively, 
the DNA sequences may be chemically synthesized using, for example, synthesizers. See, for 
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example, the techniques described in OLIGONUCLEOTIDE SYNTHESIS, 1984, Gait, ed., 
IRL Press, Oxford, which is incorporated by reference herein in its entirety. They may be 
assembled from fragments and short oligonucleotide linkers, or from a series of 
oligonucleotides. The are preferably made by RT-PCR methods. The resulting synthetic 
gene is capable of being expressed in a recombinant vector. 

In some cases the recombinant constructs will be expression vectors, which are 
capable of expressing the RNA and/or protein products of the encoded DNA(s). Thus, the 
vector may further comprise regulatory sequences, including for example, a promoter, 
operably linked to the open reading frame (ORF). The vector may further comprise a 
selectable marker sequence. 

Specific initiation signals may also be required for efficient translation of inserted 
target gene coding sequences. These signals include the ATG initiation codon and adjacent 
sequences. In cases where a target DNA includes its own initiation codon and adjacent 
sequences is inserted into the appropriate expression vector, no additional translation control 
signals may be needed. However, in cases where only a portion of an ORF is used, 
exogenous translational control signals, including, perhaps, the ATG initiation codon, must be 
provided. Furthermore, the initiation codon must be in phase with the reading frame of the 
desired coding sequence to ensure translation of the entire target. These exogenous 
translational control signals and initiation codons can be of a variety of origins, both natural 
and synthetic. The efficiency of expression may be enhanced by the inclusion of appropriate 
transcription enhancer elements, transcription terminators, etc. (see Bittner et ai, Methods in 
EnzymoL 153:516-544 (1987)). Some appropriate cloning and expression vectors for use 
with prokaryotic and eukaryotic hosts are described by Sambrook, et aL, in Molecular 
Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor, New York (1989), the 
disclosure of which is hereby incorporated by reference. 

If desired, to enhance expression and facilitate proper protein folding, the codon 
context and codon pairing of the sequence may be optimized for the particular expression 
organism, as explained by Hatfield et al, U.S. Patent No. 5,082,767. 

The present invention further provides host cells containing at least one of the DNAs 
of the present invention. The host cell can be virtually any cell for which expression vectors 
are available. It may be, for example, a higher eukaryotic host cell, such as a mammalian 
cell, a lower eukaryotic host cell, such as a yeast cell, or the host cell can be a prokaryotic 
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cell, such as a bacterial cell. Introduction of the recombinant construct into the host cell can 
be effected by calcium phosphate transfection, DEAE, dextran mediated transfection, or 
electroporation (Davis et al, Basic Methods in Molecular Biology (1986)). 

A wide variety of expression systems are available, such as: yeast (e.g. 
Saccharomyces, Pichia) transformed with recombinant yeast expression vectors containing the 
target DNA; insect cell systems infected with recombinant virus expression vectors (e.g. , 
baculovirus) containing the target DNA sequences; plant cell systems infected with 
recombinant virus expression vectors (e.g., cauliflower mosaic virus, CaMV; tobacco mosaic 
virus, TMV) or transformed with recombinant plasmid expression vectors (e.g. Ti plasmid) 
containing target DNA coding sequences; or mammalian cell systems (e.g. COS, CHO, 
BHK, 293, 3T3) harboring recombinant expression constructs containing promoters derived 
from the genome of mammalian cells (e.g., metallothionein promoter) or from mammalian 
viruses (e.g. , the adenovirus late promoter; the vaccinia virus 7.5K promoter). 

Depending on the system chosen, the resulting product may differ. For example, 
proteins expressed in most bacterial cultures, e.g. , E. coli, will be free of glycosylation 
modifications; polypeptides or proteins expressed in yeast will have a glycosylation pattern 
different from that expressed in mammalian cells. 

Vectors 

Generally, recombinant expression vectors will include origins of replication and 
selectable markers permitting selection of the host cell, e.g. , the ampicillin resistance gene of 
E. coli and S. cerevisiae TRP1 gene, and a promoter derived from a highly-expressed gene to 
direct transcription of a downstream structural sequence. Such promoters can be derived 
from operons encoding glycolytic enzymes such as 3-phosphoglycerate kinase (PGK), 
a-factor, acid phosphatase, or heat shock proteins, among others. The heterologous 
structural sequence is assembled in appropriate phase with translation initiation and 
termination sequence, and in one aspect of the invention, a leader sequence capable of 
directing secretion of translated protein into the periplasmic space or extracellular medium. 
Optionally, the heterologous sequence can encode a fusion protein including an N-terminal or 
C-terminal identification peptide imparting desired characteristics, e.g., stabilization or 
simplified purification of expressed recombinant product. 
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Bacterial Expression 

Useful expression vectors for bacterial use are constructed by inserting a structural 
DNA sequence encoding a desired protein together with suitable translation initiation and 
termination signals in operable reading phase with a functional promoter. The vector will 
comprise one or more phenotypic selectable markers and an origin of replication to ensure 
maintenance of the vector and, if desirable, to provide amplification within the host. Suitable 
prokaryotic hosts for transformation include E. coli, Bacillus subtilis, Salmonella 
typhimurium and various species within the genera Pseudomonas, Streptomyces, and 
Staphylococcus, although others may, also be employed as a matter of choice. 

Bacterial vectors may be, for example, bacteriophage-, plasmid- or cosmid-based. 
These vectors can comprise a selectable marker and bacterial origin of replication derived 
from commercially available plasmids typically containing elements of the well known 
cloning vector pBR322 (ATCC 37017). Such commercial vectors include, for example, 
GEM 1 (Promega Biotec, Madison, WI, USA), pBs, phagescript, PsiX174, pBluescript SK, 
pBs KS, pNH8a, pNH16a, pNH18a, pNH46a (Stratagene); pTrc99A, pKK223-3, pKK233-3, 
pKK232-8, pDR540, and pRIT5 (Pharmacia). 

These "backbone" sections are combined with an appropriate promoter and the 
structural sequence to be expressed. Bacterial promoters include lac, T3, T7, lambda P R or 
Pl> tiPi ai *d ara - 

Following transformation of a suitable host strain and growth of the host strain to an 
appropriate cell density, the selected promoter is derepressed/induced by appropriate means 
(e.g., temperature shift or chemical induction) and cells are cultured for an additional period. 
Cells are typically harvested by centrifugation, disrupted by physical or chemical means, and 
the resulting crude extract retained for further purification. 

In bacterial systems, a number of expression vectors may be advantageously selected 
depending upon the use intended for the protein being expressed. For example, when a large 
quantity of such a protein is to be produced, for the generation of antibodies or to screen 
peptide libraries, for example, vectors which direct the expression of high levels of fusion 
protein products that are readily purified may be desirable. Such vectors include, but are not 
limited, to the E. coli expression vector pUR278 (Ruther et al., 1983, EMBO 7. 2: 1791), in 
which the coding sequence may be ligated into the vector in frame with the lac Z coding 
region so that a fusion protein is produced; pIN vectors (Inouye et al 1985, Nucleic Acids 
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Res. 13:3101-3109; Van Heeke et aL, 1989, J. Biol. Chem. 264:5503-5509); pET vectors, 
Studier et aL , Methods in Enzymology 185: 60-89 (Academic Press 1990); and the like. 

Moreover, pGEX vectors may be used to express foreign polypeptides as fusion 
proteins with glutathione S-transferase (GST). In general, such fusion proteins are soluble 
and easily can be purified from lysed cells by adsorption to glutathione-agarose beads 
followed by elution in the presence of free glutathione. The pGEX vectors are designed to 
include thrombin or factor Xa protease cleavage sites so that the cloned target gene protein 
can be released from the GST moiety. 

In a one embodiment, full length cDNA sequences are appended with in-frame BamHl 
sites at the amino terminus and EcoRl sites at the carboxyl terminus using standard PCR 
methodologies (Innis et al., 1990, supra) and ligated into the pGEX-2TK vector (Pharmacia, 
Uppsala, Sweden). The resulting cDNA construct contains a kinase recognition site at the 
amino terminus for radioactive labeling and glutathione S-transferase sequences at the 
carboxyl terminus for affinity purification (Nilsson, et aL 1985, EMBO J. 4: 1075; Zabeau 
and Stanley, 1982, EMBO J. 1: 1217. 

Eukaryotic Expression 

Various mammalian cell culture systems can also be employed to express recombinant 
protein. Examples of mammalian expression systems include the COS-7 lines of monkey 
kidney fibroblasts, described by Gluzman, Cell 23:175 (1981), and other cell lines capable of 
expressing a compatible vector, for example, the C127, 3T3, CHO, HeLa and BHK cell 
lines. Mammalian expression vectors will comprise an origin of replication, a suitable 
promoter and enhancer, and also any necessary ribosome binding sites, poly adenylation site, 
splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking 
nontranscribed sequences. DNA sequences derived from the SV40 viral genome, for 
example, SV40 origin, early promoter, enhancer, splice, and polyadenylation sites may be 
used to provide the required nontranscribed genetic elements. 

Mammalian promoters include CMV immediate early, HSV thymidine kinase, early 
and late SV40, LTRs from retrovirus, and mouse metallothionein-I. Exemplary mammalian 
vectors include pWLneo, pSV2cat, pOG44, pXTl, pSG (Stratagene) pSVK3, pBPV, pMSG, 
and pSVL (Pharmacia). Selectable markers include CAT (chloramphenicol transferase). 

In mammalian host cells, a number of viral-based expression systems may be utilized. 
In cases where an adenovirus is used as an expression vector, the coding sequence of interest 
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may be ligated to an adenovirus transcription/translation control complex, e.g., the late 
promoter and tripartite leader sequence. This chimeric gene may then be inserted in the 
adenovirus genome by in vitro or in vivo recombination. Insertion in a non-essential region of 
the viral genome (e.g., region El or E3) will result in a recombinant virus that is viable and 
capable of expressing a target protein in infected hosts, (p.g. , See Logan et al , 1984, Proc. 
Natl. Acad. Sri. USA 81:3655-3659). 

In one embodiment, cDNA sequences encoding the full-length open reading frames 
are ligated into pCMVB replacing the 8-galactosidase gene such that cDNA expression is 
driven by the CMV promoter (Alam, 1990, Anal Biochem. 188: 245-254; MacGregor et al y 
1989, Nucl. Acids Res. 17: 2365; Norton et al 1985, Mol Cell Biol 5: 281). 

In addition, a host cell strain may be chosen which modulates the expression of the 
inserted sequences, or modifies and processes the gene product in the specific fashion desired. 
Such modifications (e.g., glycosylation) and processing (e.g., cleavage) of protein products 
may be important for the function of the protein. Different host cells have characteristic and 
specific mechanisms for the post-translational processing and modification of proteins. 

Appropriate cell lines or host systems can be chosen to ensure the correct modification 
and processing of the foreign protein expressed. To this end, eukaryotic host cells which 
possess the cellular machinery for proper processing of the primary transcript, glycosylation, 
and phosphorylation of the gene product may be used. Such mammalian host cells include 
but are not limited to CHO, VERO, BHK, HeLa, COS, MDCK, 293, 3T3, WI38, etc. 

For long-term, high-yield production of recombinant proteins in eukaryotic cells, 
stable expression is preferred. Rather than using expression vectors which contain viral 
origins of replication, host cells can be transformed with DNA controlled by appropriate 
expression control elements (e.g., promoter, enhancer, sequences, transcription terminators, 
polyadenylation sites, etc.), and a selectable marker. 

Following the introduction of the foreign DNA, engineered cells may be allowed to 
grow for 1-2 days in an enriched media, and then are switched to a selective media. The 
selectable marker in the recombinant plasmid confers resistance to the selection and allows 
cells to stably integrate the plasmid into their chromosomes and grow to form foci which in 
turn can be cloned and expanded into cell lines. This method may advantageously be used to 
engineer cell lines which express the target protein. Such engineered cell lines may be 
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particularly useful in screening and evaluation of compounds that affect the endogenous 
activity of the protein. 

A number of selection systems may be used, including but not limited to the herpes 
simplex virus thymidine kinase (Wigler, et al, Cell 11:223 (1977)), hypoxanthine-guanine 
phosphoribosyltransferase(Szybalskaef al, Proc. Natl. Acad. Sci. USA 48:2026 (1962)), and 
adenine phosphoribosyltransferase(Lowy, et al. , Cell 22:817 (1980)) genes can be employed 
in tk", hgprt" or aprf cells, respectively. Also, antimetabolite resistance can be used as the 
basis of selection for dhfir, which confers resistance to methotrexate (Wigler, et al , Proc. 
Natl Acad, ScL USA 77:3567 (1980)); O'Hare, et al, 1981, Proc. Natl. Acad. Set. USA 
78:1527); gpt, which confers resistance to mycophenolic acid (Mulligan et al, Proc. Natl 
Acad. Sci. USA 78:2072 (1981)); neo, which confers resistance to the aminoglycoside G-418 
(Colberre-Garapin, et al , 1981 , J. Mol. Biol 150: 1); and hydro, which confers resistance to 
hygromycin (Santerre, et al , 1984, Gene 30: 147) genes. 

An alternative fusion protein system allows for the ready purificationof non-denatured 
fusion proteins expressed in human cell lines (Janknecht, et al , Proc. Natl Acad. Sci. USA 
88: 8972-8976 (1991)). In this system, the gene of interest is subcloned into a vaccinia-based 
plasmid such that the gene's open reading frame is translationally fused to an amino-terminal 
tag consisting of six histidine residues. Extracts from cells infected with recombinant 
vaccinia virus are loaded onto N? + nitriloacetic acid-agarose columns and histidine-tagged 
proteins are selectively eluted with imidazole-containing buffers. 

In an insect system, Autographa californica nuclear polyhedrosis virus (AcNPV) is 
used as a vector to express foreign genes. The virus grows in Spodoptera frugiperda cells. 
The target coding sequence may be cloned individually into non-essential regions (for 
example the polyhedrin gene) of the virus and placed under control of an AcNPV promoter 
(for example the polyhedrin promoter). Successful insertion of a target gene coding sequence 
will result in inactivation of the polyhedrin gene and production of non-occluded recombinant 
virus (i.e., virus lacking the proteinaceous coat coded for by the polyhedrin gene). These 
recombinant viruses are then used to infect Spodoptera frugiperda cells in which the inserted 
gene is expressed. (E.g., see Smith et al., 1983, /. Virol. 46: 584; Smith, U.S. Patent No. 
4,215,051). 
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While the present proteins can be expressed in recombinant systems, as described 
above, cell-free translation systems can also be employed to produce such proteins using 
RNAs derived from the DNA constructs of the present invention. 

Purification of Recombinant Proteins 

Recombinant proteins produced may be isolated by host cell lysis. This may be 
followed by one or more salting-out, aqueous ion exchange or size exclusion chromatography 
steps. Finally, high performance liquid chromatography (HPLC) can be employed for final 
purification steps. Microbial cells employed in expression of proteins can be disrupted by any 
convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use 
of cell lysing agents, like lysozyme and chelators. 

If inclusion bodies are formed in bacterial systems, they may be extracted from cell 
pellets using, for example, detergents, reducing agents, salts, urea, guanidinium chloride and 
extremes of pH (e.g. <4 or >10). If denaturation occurs, protein refolding steps (e.g., 
dialysis) can be used, as necessary, in completing configuration of the mature protein. If 
disulfide bridges are present in the native protein, they may be reoxidized using known 
methods. 

By way of specific non-limiting example, the recombinant bacterial cells, for example 
E. coli, are grown in any of a number of suitable media, for example LB, and the expression 
of the recombinant protein induced by adding IPTG (e.g. , lac operator-promoter) to the media 
or switching incubation to a higher temperature (e.g. , X cl 857 ). After culturing the bacteria for 
a further period of between 2 and 24 hours, the cells are collected by centrifugation and 
washed to remove residual media. The bacterial cells are then lysed, for example, by 
disruption in a cell homogenizer and centrifuged to separate the cell membranes from the 
soluble cell components. If the protein aggregates into inclusion bodies, this centrifugation 
can be performed under conditions whereby the dense inclusion bodies are selectively 
enriched by incorporation of sugars such as sucrose into the buffer and centrifugation at a 
selective speed. The inclusion bodies can then be washed in any of several solutions to 
remove some of the contaminating host proteins, then solubilized in solutions containing high 
concentrations of urea (e.g. 8M) or chaotropic agents such as guanidinium hydrochloride in 
the presence of reducing agents such as 6-mercaptoethanolor DTT (dithiothreitol). 

At this stage it may be advantageous to incubate the protein for several hours under 
conditions suitable for the protein to undergo a refolding process into a conformation which 
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more closely resembles that of the native protein. Such conditions generally include low 
protein concentrations less than 500 |ig/ml), low levels of reducing agent, concentrations of 
urea less than 2 M and often the presence of reagents such as a mixture of reduced and 
oxidized glutathione which facilitate the interchange of disulphide bonds within the protein 
molecule. The refolding process can be monitored, for example, by SDS-PAGE or with 
antibodies which are specific for the native molecule. Following refolding, the protein can 
then be purified farther and separated from the refolding mixture by chromatography on any 
of several supports including ion exchange resins, gel permeation resins or on a variety of 
affinity columns. 

Labeling Proteins 

When used as a component in assay systems such as those described, below, the target 
protein may be labeled, either directly or indirectly, to facilitate detection of the present res- 
like molecules either in vitro or in vivo. Any of a variety of suitable labeling systems may be 
used including but not limited to radioisotopes such as 125 I; enzyme labeling systems that 
generate a detectable colorimetric signal or light when exposed to substrate; and fluorescent 
labels. 

Where recombinant DNA technology is used for protein production the, it may be 
advantageous to engineer fusion proteins that can facilitate labeling, immobilization and/or 
detection. These fusion proteins may, for example, add amino acids which facilitate further 
chemical modification. They also may add a functional moiety, such as an enzyme, which 
directly facilitates detection. 

TRANSGENIC ANIMALS 

The invention farther contemplates animal models for studying the function of the 
present molecules and for overproducing the protein products. The disclosed DNA sequences 
may be used in conjunction with techniques for producing transgenic animals that are well 
known to those of skill in the art. 
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To prepare transgenic animals, target gene sequences may for example be introduced 
into, and overexpressed in, the genome of the animal of interest, or, if endogenous target 
gene sequences are present, they may either be overexpressed or, alternatively, be disrupted 
in order to underexpress or inactivate target gene expression, such as described for the 
disruption of apoE in mice (Plumef al. y CelllY. 343-353(1992)). 

In order to overexpress a target gene sequence, the coding portion of the target gene 
sequence may be ligated to a regulatory sequence which is capable of driving gene expression 
in the animal and cell type of interest. Such regulatory regions will be well known to those of 
skill in the art, and may be utilized in the absence of undue experimentation. 

For underexpressionof an endogenous target gene sequence, such a sequence may be 
isolated and engineered such that when reintroduced into the genome of the animal of interest, 
the endogenous target gene alleles will be inactivated. Preferably, the engineered target gene 
sequence is introduced via gene targeting such that the endogenous target sequence is 
disrupted upon integration of the engineered target gene sequence into the animal' s genome. 

Animals of any species, including, but not limited to, mice, rats, rabbits, guinea pigs, 
pigs, micro-pigs, goats, and non-human primates, e.g., baboons, monkeys, and chimpanzees 
may be used to generate cardiovascular disease animal models. Goats, cows and sheep are 
particularly preferred for producing protein in vivo. 

Any technique known in the art may be used to introduce a target gene transgene into 
animals to produce the founder lines of transgenic animals. Such techniques include, but are 
not limited to pronuclear microinjection (Hoppe et ai, U.S. Pat. No. 4,873,191 (1989)); 
retrovirus mediated gene transfer into germ lines (Van der Puttenef a/., Proc. Natl. Acad. 
ScL, USA 82:6148-6152 (1985)); gene targeting in embryonic stem cells (Thompson et a/., 
Cell 56:313-321 (1989)); electroporation of embryos (Lo, Mol. Cell. BioL 3:1803-1814 
(1983)); and sperm-mediated gene transfer (Lavitrano et al., Cell 57:717-723 (1989)); etc. 
For a review of such techniques, see Gordon, Transgenic Animals, Intl. Rev. Cytol. 115:171- 
229 (1989). 

The present invention provides for transgenic animals that carry the transgene in all 
their cells, as well as animals which carry the transgene in some, but not all their cells, i.e., 
mosaic animals. The transgene may be integrated as a single transgene or in conca tamers, 
e.g., head-to-head tandems or head-to-tail tandems. The transgene may also be selectively 
introduced into and activated in a particular cell type by following, for example, the teaching 
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of Lasko et al. (Lasko et al, Proc. Natl Acad. Sci. USA 89:3232-6236 (1992)). The 
regulatory sequences required for such a cell-type specific activation will depend upon the 
particular cell type of interest, and will be apparent to those of skill in the art. When it is 
desired that the target gene be integrated into the chromosomal site of the endogenous target 
gene, gene targeting is preferred. Briefly, when such a technique is to be utilized, vectors 
containing some nucleotide sequences homologous to the endogenous target gene of interest 
are designed for the purpose of integrating, via homologous recombination with chromosomal 
sequences, into and disrupting the function of the nucleotide sequence of the endogenous 
target gene. 

The transgene may also be selectively introduced into a particular cell type, thus 
inactivating the endogenous gene of interest in only that cell type, by following, for example, 
the teaching of Gu et al Science 265: 103-106 (1994)). The regulatory sequences required 
for such a cell-type specific inactivation will depend upon the particular cell type of interest, 
and will be apparent to those of skill in the art. 

Once transgenic animals have been generated, the expression of the recombinant target 
gene and protein may be assayed utilizing standard techniques. Initial screening may be 
accomplished by Southern blot analysis or PCR techniques to analyze animal tissues to assay 
whether integration of the transgene has taken place. The level of mRNA expression of the 
transgene in the tissues of the transgenic animals may also be assessed using techniques which 
include but are not limited to Northern blot analysis of tissue samples obtained from the 
animal, in situ hybridization analysis, and RT-PCR. Samples of target gene-expressing 
tissue, may also be evaluated immunocytochemically using antibodies specific for the target 
gene transgene gene product of interest. 

The transgenic animals that express target gene mRNA or target gene transgene 
peptide (detected immunocytochemically, using antibodies directed against the target gene 
product's epitopes) at easily detectable levels should then be further evaluated to identify those 
animals which display characteristic increased susceptibility to carcinogenesis. Additionally, 
specific cell types within the transgenic animals may be analyzed and assayed in vitro for 
cellular phenotypes characteristic of mutant phenotype. 

Once target gene transgenic founder animals are produced, they may be bred, inbred, 
outbred, or crossbred to produce colonies of the particular animal. Examples of such 
breeding strategies include but are not limited to: outbreeding of founder animals with more 
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than one integration site in order to establish separate lines; inbreeding of separate lines in 
order to produce compound target gene transgenics that express the target gene transgene of 
interest at higher levels because of the effects of additive expression of each target gene 
transgene; crossing of heterozygous transgenic animals to produce animals homozygous for a 
given integration site in order both to augment expression and eliminate the possible need for 
screening of animals by DNA analysis; crossing of separate homozygous lines to produce 
compound heterozygous or homozygous lines; breeding animals to different inbred genetic 
backgrounds so as to examine effects of modifying alleles on expression of the target gene 
transgene and the possible development of carcinogenesis. One such approach is to cross the 
target gene transgenic founder animals with a wild type strain to produce an Fl generation 
that exhibits increased susceptibility to carcinogenesis. The Fl generation may then be inbred 
in order to develop a homozygous line, if it is found that homozygous target gene transgenic 
animals are viable. 

Methods of generating "knockout" mice using homologous recombination in 
embryonic stem cells are well known in the art. Suitable methods are described, for example, 
in Mansour et aL, Nature, 336:348 (1988); Zijlstra et aL, Nature, 342:435 (1989) and 
344:742 (1990); and Hasty et at., Nature, 350:243 (1991). This genomic DNA can be 
obtained by conventional methods using the cDNA sequence as a probe in a commercially- 
available genomic DNA library. 

Briefly, a genomic fragment is cleaved with a restriction endonuclease and a 
heterologous cassette containing a neomycin-resistancegene is inserted at the cleavage site. A 
suitable cassette is the GTI-II neo cassette described by Luflkin et aL, Cell 66:1105 (1991). 
The modified genomic fragment is cloned into a suitable targeting vector that is introduced 
into murine embryonic stem cells by electroporation. Cells that have undergone homologous 
recombination (and hence disruption of the gene) are selected by resistance to G418, and used 
to generate chimeric mice using well known methods. See Lufkin et aL, supra. Traditional 
breeding methods then can be used to generate mice that are homozygous for the disrupted 
gene. 

The phenotype of mice that are homozygous for the mutation then can be studied to 
provide insights into the role of the protein in, for example, carcinogenesis. These mice also 
can be used as models for developing new treatments for cancers. If this mutation is lethal in 
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homozygous mice (for example during embryogenesis) heterozygous mice, which express 
only half the amount of the protein can also be studied. 



GENE THERAPY APPLICATIONS 

When mutations in the inventive protein, or in the elements controlling expression of 
that protein, are found to be associated with a malignant phenotype, control of cellular 
proliferation can be restored by gene therapy methods. For example, overexpression of the 
protein can be counteracted by concurrent expression of an antisense molecule that binds to 
and inhibits expression of the mRNA encoding the protein. Alternatively, overexpression can 
be inhibited in an analogous manner using a ribozyme that cleaves the mRNA. In another 
embodiment, where expression of a mutated protein induces the malignant phenotype, 
concomitant expression of the non-mutated molecule via introduction of an exogenous gene 
may be used. Methods of using antisense and ribozyme technology to control gene 
expression, or of gene therapy methods for expression of an exogenous gene in this manner 
are well known in the art. 

Each of these methods requires a system for introducing a vector into the cells 
containing the mutated gene. The vector encodes either an antisense or ribozyme transcript of 
the inventive protein. The construction of a suitable vector can be achieved by any of the 
methods well-known in the art for the insertion of exogenous DNA into a vector. See, e.g. , 
Sambrook et aL, Molecular Cloning (Cold Spring Harbor Press 2d ed. 1989), which is 
incorporated herein by reference. In addition, the prior art teaches various methods of 
introducing exogenous genes into cells in vivo. See Rosenberg et al. , Science 242: 1575-1578 
(1988) and Wolff et aL, PNAS 86:9011-9014 (1989), which are incorporated herein by 
reference. The routes of delivery include systemic administration and administration in situ. 
Well-known techniques include systemic administration with cationic liposomes, and 
administration in situ with viral vectors. Any one of the gene delivery methodologies 
described in the prior art is suitable for the introduction of a recombinant vector containing an 
inventive gene according to the invention into a MTX-resistant, transport-deficient cancer 
cell. A listing of present-day vectors suitable for the purpose of this invention is set forth in 
Hodgson, Bio/T echnology 13: 222 (1995), which is incorporated by reference. 

For example, liposome-mediated gene transfer is a suitable method for the 
introduction of a recombinant vector containing an inventive gene according to the invention 
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into a MTX-resistant, transport-deficient cancer cell. The use of a cationic liposome, such as 
DC-Chol/DOPE liposome, has been widely documented as an appropriate vehicle to deliver 
DNA to a wide range of tissues through intravenous injection of DNA/cationic liposome 
complexes. See Caplen et aL, Nature Med. 1:39-46 (1995) and Zhu et a/., Science 261:209- 
211 (1993), which are herein incorporated by reference. Liposomes transfer genes to the 
target cells by fusing with the plasma membrane. The entry process is relatively efficient, but 
once inside the cell, the liposome-DNA complex has no inherent mechanism to deliver the 
DNA to the nucleus. As such, the most of the lipid and DNA gets shunted to cytoplasmic 
waste systems and destroyed. The obvious advantage of liposomes as a gene therapy vector is 
that liposomes contain no proteins, which thus minimizes the potential of host immune 
responses. 

As another example, viral vector-mediated gene transfer is also a suitable method for 
the introduction of the vector into a target cell. Appropriate viral vectors include adenovirus 
vectors and adeno-associated virus vectors, retrovirus vectors and herpesvirus vectors. 

Adenoviruses are linear, double stranded DNA viruses complexed with core proteins 
and surrounded by capsid proteins. The common serotypes 2 and 5, which are not associated 
with any human malignancies, are typically the base vectors. By deleting parts of the virus 
genome and inserting the desired gene under the control of a constitutive viral promoter, the 
virus becomes a replication deficient vector capable of transferring the exogenous DNA to 
differentiated, non-proliferating cells. To enter cells, the adenovirus fibre interacts with 
specific receptors on the cell surface, and the adenovirus surface proteins interact with the cell 
surface integrins. The virus penton-cell integrin interaction provides the signal that brings the 
exogenous gene-containing virus into a cytoplasmic endosome. The adenovirus breaks out of 
the endosome and moves to the nucleus, the viral capsid falls apart, and the exogenous DNA 
enters the cell nucleus where it functions, in an epichromosomal fashion, to express the 
exogenous gene. Detailed discussions of the use of adenoviral vectors for gene therapy can 
be found in Berkner, Biotechniques 6:616-629 (1988) and Trapnell, Advanced Drug Delivery 
Rev. 72:185-199 (1993), which are herein incorporated by reference. Adenovirus-derived 
vectors, particularly non-replicative adenovirus vectors, are characterized by their ability to 
accommodate exogenous DNA of 7.5 kB, relative stability, wide host range, low 
pathogenicity in man, and high titers (l(f to 10 5 plaque forming units per cell). See Stratford- 
Perricaudet et al , PNAS 59:2581 (1992). 
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Adeno-associated virus (AAV) vectors also can be used for the present invention. 
AAV is a linear single-stranded DNA parvovirus that is endogenous to many mammalian 
species. AAV has a broad host range despite the limitation that AAV is a defective 
parvovirus which is dependent totally on either adenovirus or herpesvirus for its reproduction 
in vivo. The use of AAV as a vector for the introduction into target cells of exogenous DNA 
is well-known in the art. See, e.g., Lebkowski et al y Mole. & Cell Biol 8:3988 (1988), 
which is incorporated herein by reference. In these vectors, the capsid gene of AAV is 
replaced by a desired DNA fragment, and transcomplementation of the deleted capsid 
function is used to create a recombinant virus stock. Upon infection the recombinant virus 
uncoats in the nucleus and integrates into the host genome. 

Another suitable virus-based gene delivery mechanism is retroviral vector-mediated 
gene transfer. In general, retroviral vectors are well-known in the art. See Breakfield et al. , 
Mole. Neuro. Biol. i:339 (1987) and Shih et al., in Vaccines 85: 177 (Cold Spring Harbor 
Press 1985). A variety of retroviral vectors and retroviral vector-producing cell lines can be 
used for the present invention. Appropriate retroviral vectors include Moloney Murine 
Leukemia Virus, spleen necrosis virus, and vectors derived from retroviruses such as Rous 
Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, human immunodeficiency virus, 
myeloproliferative sarcoma virus, and mammary tumor virus. These vectors include 
replication-competent and replication-defective retroviral vectors. In addition, amphotropic 
and xenotropic retroviral vectors can be used. In carrying out the invention, retroviral 
vectors can be introduced to a tumor directly or in the form of free retroviral vector 
producing-cell lines. Suitable producer cells include fibroblasts, neurons, glial cells, 
keratinocytes, hepatocytes, connective tissue cells, ependymal cells, chromaffin cells. See 
Wolff etai, PNAS 84:3344 (1989). 

Retroviral vectors generally are constructed such that the majority of its structural 
genes are deleted or replaced by exogenous DNA of interest, and such that the likelihood is 
reduced that viral proteins will be expressed. See Bender et al. , 7. Virol. 61: 1639 (1987) and 
Armento et al., J. Virol. 67:1647 (1987), which are herein incorporated by reference. To 
facilitate expression of the antisense or ribozyme molecule, of the inventive protein, a 
retroviral vector employed in the present invention must integrate into the genome of the host 
cell genome, an event which occurs only in mitotically active cells. The necessity for host 
cell replication effectively limits retroviral gene expression to tumor cells, which are highly 
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replicative, and to a few normal tissues. The normal tissue cells theoretically most likely to 
be transduced by a retroviral vector, therefore, are the endothelial cells that line the blood 
vessels that supply blood to the tumor. In addition, it is also possible that a retroviral vector 
would integrate into white blood cells both in the tumor or in the blood circulating through 
the tumor. 

The spread of retroviral vector to normal tissues, however, is limited. The local 
administration to a tumor of a retroviral vector or retroviral vector producing cells will 
restrict vector propagation to the local region of the tumor, minimizing transduction, 
integration, expression and subsequent cytotoxic effect on surrounding cells that are 
mitotically active. 

Both replicatively deficient and replicatively competent retroviral vectors can be used 
in the invention, subject to their respective advantages and disadvantages. For instance, for 
tumors that have spread regionally, such as lung cancers, the direct injection of cell lines that 
produce replication-deficient vectors may not deliver the vector to a large enough area to 
completely eradicate the tumor, since the vector will be released only form the original 
producer cells and their progeny, and diffusion is limited. Similar constraints apply to the 
application of replication deficient vectors to tumors that grow slowly, such as human breast 
cancers which typically have doubling times of 30 days versus the 24 hours common among 
human gliomas. The much shortened survival-time of the producer cells, probably no more 
than 7-14 days in the absence of immunosuppression, limits to only a portion of their 
replicative cycle the exposure of the tumor ceils to the retroviral vector. 

The use of replication-defective retroviruses for treating tumors requires producer 
cells and is limited because each replication-defective retrovirus particle can enter only a 
single cell and cannot productively infect others thereafter. Because these replication- 
defective retroviruses cannot spread to other tumor cells, they would be unable to completely 
penetrate a deep, multilayered tumor in vivo. See Markert et al, Neurosurg. 77: 590 (1992). 
The injection of replication-competent retroviral vector particles or a cell line that produces a 
replication-competent retroviral vector virus may prove to be a more effective therapeutic 
because a replication competent retroviral vector will establish a productive infection that will 
transduce cells as long as it persists. Moreover, replicatively competent retroviral vectors 
may follow the tumor as it metastasizes, carried along and propagated by transduced tumor 
cells. The risks for complications are greater, with replicatively competent vectors, however. 
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Such vectors may pose a greater risk then replicatively deficient vectors of transducing normal 
tissues, for instance. The risks of undesired vector propagation for each type of cancer and 
affected body area can be weighed against the advantages in the situation of replicatively 
competent verses replicatively deficient retroviral vector to determine an optimum treatment. 

Both amphotropic and xenotropic retroviral vectors may be used in the invention. 
Amphotropic viruses have a very broad host range that includes most or all mammalian cells, 
as is well known to the art. Xenotropic viruses can infect all mammalian cell cells except 
mouse cells. Thus, amphotropic and xenotropic retroviruses from many species, including 
cows, sheep, pigs, dogs, cats, rats, and mice, inter alia can be used to provide retroviral 
vectors in accordance with the invention, provided the vectors can transfer genes into 
proliferating human cells in vivo. 

Clinical trials employing retroviral vector therapy treatment of cancer have been 
approved in the United States. See Culver, Clin. Chem. 40: 510 (1994). Retroviral vector- 
containing cells have been implanted into brain tumors growing in human patients. See 
Oldfield et aL, Hum. Gene Ther. 4: 39 (1993). These retroviral vectors carried the HSV-1 
thymidine kinase (HSV-tk) gene into the surrounding brain tumor cells, which conferred 
sensitivity of the tumor cells to the antiviral drug ganciclovir. Some of the limitations of 
current retroviral based cancer therapy, as described by Oldfield are: (1) the low titer of virus 
produced, (2) virus spread is limited to the region surrounding the producer cell implant, (3) 
possible immune response to the producer cell line, (4) possible insertional mutagenesis and 
transformation of retroviral infected cells, (5) only a single treatment regimen of pro-drug, 
ganciclovir, is possible because the "suicide" product kills retrovirally infected cells and 
producer cells and (6) the bystander effect is limited to cells in direct contact with retrovirally 
transformed cells. See Bi et al. , Human Gene Therapy 4: 725 (1993). 

Yet another suitable virus-based gene delivery mechanism is herpesvirus vector- 
mediated gene transfer. While much less is known about the use of herpesvirus vectors, 
replication-competent HSV-1 viral vectors have been described in the context of antitumor 
therapy. See Martuza et aL, Science 252: 854 (1991), which is incorporated herein by 
reference. 
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The present invention also contemplates, for certain molecules described below, 
methods for diagnosis of human disease. In particular, patients can be screened for the 
occurrence of cancers, or likelihood of occurrence of cancers, associated with mutations in 
the encoded protein. DNA from tumor tissue obtained from patients suffering from cancer 
can be isolated and the gene encoding the protein can be sequenced. By examining a number 
of patients in this manner, mutations in the gene that are associated with a malignant cellular 
phenotype can be identified. In addition, correlation of the nature of the observed mutations 
with subsequent observed clinical outcomes allows development of prognostic model for the 
predicted outcome in a particular patient. 

Screening for mutations conveniently can be carried out at the DNA level by use of 
PCR, although the skilled artisan will be aware that many other well known methods are 
available for the screening. PCR primers can be selected that flank known mutation sites, and 
the PCR products can be sequenced to detect the occurrence of the mutation. Alternatively, 
the 3' residue of one PCR primer can be selected to be a match only for the residue found in 
the unmutated gene. If the gene is mutated, there will be a mismatch at the 3 1 end of the 
primer, and primer extension cannot occur, and no PCR product will be obtained. 
Alternatively, primer mixtures can be used where the 3* residue of one primer is any 
nucleotide other than the nonmutated residue. Observation of a PCR product then indicates 
that a mutation has occurred. Other methods of using, for example, oligonucleotide probes to 
screen for mutations are described, or example, in U.S. Patent No. 4,871,838, which is 
herein incorporated by reference in its entirety. 

Alternatively, antibodies can be generated that selectively bind either mutated or non- 
mutated protein. The antibodies then can be used to screen tissue samples for occurrence of 
mutations in a manner analogous to the DNA-based methods describedswpra. 

The diagnostic methods described above can be used not only for diagnosis and for 
prognosis of existing disease, but may also be used to predict the likelihood of the fiiture 
occurrence of disease. For example, clinically healthy patients can be screened for mutations 
in the inventive molecule that correlate with later disease onset. Such mutations may be 
observed in the heterozygous state in healthy individuals. In such cases a single mutation 
event can effectively disable proper functioning of the gene and induce a transformed or 
malignant phenotype. This screening also may be carried out prenatally or neonatally . 



118 



WO 01/12659 PC17IB00/01496 

DNA molecules according to the invention also are well suited for use in so-called 
"gene chip" diagnostic applications. Such applications have been developed by, inter alia, 
Synteni and Affymetrix. Briefly, all or part of the DNA molecules of the invention can be 
used either as a probe to screen a polynucleotide array on a "gene chip," or they may be 
immobilized on the chip itself and used to identify other polynucleotides via hybridization to 
the surface of the chip. In this manner, for example, related genes can be identified, or 
expression patterns of the gene in various tissues can be simultaneously studied. Such gene 
chips have particular application for diagnosis of disease, or in forensic analysis to detect the 
presence or absence of an analyte. Suitable chip technology is described for example, in 
Wodicka et ai, Nature Biotechnology, 15:1359 (1997) which is hereby incorporated by 
reference in its entirety, and references cited therein. 

PROTEIN-PROTEIN INTERACTIONS 

Due to their similarity to certain known proteins, it is anticipated that some of the 
inventive protein molecules will interact with another class of cellular proteins. This is 
particularly true of those molecule containing leucine zipper motifs. 

Any method suitable for detecting protein-protein interactions can be employed for 
identifying interacting targets. Among the traditional methods which can be employed are co- 
immunoprecipitation, crosslinking and co-purification through gradients or chromatographic 
columns. Utilizing procedures such as these allows for the identification of GAP gene 
products. Once identified, a GAP protein can be used, in conjunction with standard 
techniques, to identify its corresponding pathway gene. For example, at least a portion of the 
amino acid sequence of the pathway gene product can be ascertained using techniques well 
known to those of skill in the art, such as via the Edman degradation technique (see, e.g. , 
Creighton, 1983, PROTEINS: STRUCTURES AND MOLECULAR PRINCIPLES, W.H. 
Freeman & Co. , N. Y. , pp. 34-49). The amino acid sequence obtained can be used as a guide 
for the generation of oligonucleotide mixtures that can be used to screen for pathway gene 
sequences. Screening can be accomplished, for example, by standard hybridization or PCR 
techniques. Techniques for the generation of oligonucleotide mixtures and for screening are 
well-known. (See e.g. , Ausubel, supra, and PCR PROTOCOLS: A GUIDE TO METHODS 
AND APPLICATIONS, 1990, Innis et al. , eds. Academic Press, Inc. , New York). 
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Additionally, methods can be employed which result in the simultaneous identification 
of interacting target genes. One method which detects protein interactions in vivo, the two- 
hybrid system, is described in detail for illustration purposes only and not by way of 
limitation. One version of this system has been described (Chien et al. y Proc. Natl. Acad. 
Sci. USA, 88: 9578-9582 (1991)) and is commercially available from Clontech (Palo Alto, 
CA). 

Briefly, utilizing such a system, plasmids are constructed that encode two hybrid 
proteins: one consists of the DNA-binding domain of a transcription activator protein fused to 
a known protein, in this case an inventive protein, and the other contains the activator 
protein's activation domain fused to an unknown protein (a putative GAP, for instance) that is 
encoded by a cDNA which has been recombined into this plasmid as part of a cDNA library. 
The plasmids are transformed into a strain of the yeast Saccharomyces cerevisiae that contains 
a reporter gene (e.g., lacZ) whose regulatory region contains the transcription activator's 
binding sites. Either hybrid protein alone cannot activate transcription of the reporter gene, 
the DNA-binding domain hybrid cannot because it does not provide activation function, and 
the activation domain hybrid cannot because it cannot localize to the activator's binding sites. 
Interaction of the two hybrid proteins reconstitutes the functional activator protein and results 
in expression of the reporter gene, which is detected by an assay for the reporter gene 
product. 

The two-hybrid system or related methodology can be used to screen activation 
domain libraries for proteins that interact with a known "bait" gene product. By way of 
example, and not by way of limitation, gene products known to be involved in TH cell 
subpopulation-related disorders and/or differentiation, maintenance, and/or effector function 
of the subpopulations can be used as the bait gene products. Total genomic or cDNA 
sequences are fused to the DNA encoding on activation domain. This library and a plasmid 
encoding a hybrid of the bait gene product fused to the DNA-binding domain are 
cotransformed into a yeast reporter strain, and the resulting transformants are screened for 
those that express the reporter gene. For example, and not by way of limitation, the bait gene 
can be cloned into a vector such that it is translationally fused to the DNA encoding the DNA- 
binding domain of the GAM protein. These colonies are purified and the library plasmids 
responsible for reporter gene expression are isolated. DNA sequencing is then used to 
identify the proteins encoded by the library plasmids. 
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The present invention, thus generally described, will be understood more readily by 
reference to the following examples, which are provided by way of illustration and are not 
intended to be limiting of the present invention. 

The examples below are provided to illustrate the subject invention. These examples 
are provided by way of illustration and are not included for the purpose of limiting the 
invention. 

EXAMPLES 

EXAMPLE I: cDNA Library Construction 

cDNA library plates and clones originated from five cDNA libraries that were 
constructed by directional cloning. These are available through the Resource Center 
(http://www.rzpd.de) of the German Genome Project. In particular, the hfbr2 (human fetal 
brain; RZPD number DKFZp564) and hfkd2 (human fetal kidney; DKFZp566) libraries were 
generated using the Smart kit (Clontech), except that PCR was carried out with primers that 
contained uracil residues to permit directional cloning without restriction digestion and 
ligation, and were complementary with the pAMPl (LifeTechnologies) cloning sites for 
directional cloning. The htes3 (human testes; DKFZp434), hutel (human uterus; DKFZp586) 
and hmcfl (human mammary carcinoma; DKFZp727) libraries are conventional (Gubler, U., 
Hoffman, B.J., (1983), A simple and very efficient method for generating cDNA libraries. 
Gene 25, 263-269), size-selected cDNA libraries. They are cloned into pSPORTl 
(LifeTechnologies) via a NotI site which is introduced during reverse transcription 
downstream of the oligo dT primer and a Sail site that is introduced by the ligation of a 
adapters. The human mammary carcinoma library was constructed fgrom MCF7 cells. 

The cDNA sequences of this application were first identified among the sequences 
comprising various libraries. Technology has advanced considerably since the first cDNA 
libraries were made. Many small variations in both chemicals and machinery have been 
instituted over time, and these have improved both the efficiency and safety of the process. 
Although the cDNAs could be obtained using an older procedure, the procedure presented in 
this application is exemplary of one currently being used by persons skilled in the art. For the 
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purpose of providing an exemplary method, the mRNA isolation and cDNA library 
construction described here is for the MCF-7 library (DKFZp727) from which the clones 
named DKFZphmcflxxyyxx were obtained. 

The human cell line MCF-7 was grown in DMEM supplemented with 10% fetal calf 
serum until confluency. 3 X 10 8 cells were harvested with a cell scraper in PBS. Cells were 
lysed in buffer containing 0.5 % NP-40 to leave the nuclei intact. The debris was pelleted by 
centrifugation at 15 000 x g for 10 minutes at 4 degrees Celsius. Proteins in the supernatant 
were degraded in presence of SDS and Proteinase K (30 minutes at 56 degrees Celsius). 
Precipitation of proteins was done in a Phenol/Chloroform extraction, RNA was precipitated 
from the aqueous phase with Na-acetate and Ethanol. Polyadenylated messages were isolated 
using Qiagen Oligotex (QIAGEN, Hilden Germany). 

First strand cDNA synthesis was accomplished using an oligo (dT) primer which also 
contained an NotI restriction site. Second strand synthesis was performed using a 
combination of DNA polymerase I, E. coli ligase and RNase H, followed by the addition of a 
Sail adaptor to the blunt ended cDNA. The Sail adapted, double-stranded cDNA was then 
digested with NotI restriction enzyme, and fractionated by size on an agarose gel. DNA of the 
appropriate size was cut from the gel and cast into a second gel in a 90° angle. After 
electrophoresis in the second dimension, cDNA of the appropriate size was cut from the gel. 
The agarose block was broken down with help of gelase. The cDNA was purified with help of 
two phenol extractions and an ethanol precipitation. The cDNA was ligated into Sall/NotI 
pre-digested pSportl vector (LifeTechnologies) and transformed into DH10B bacteria. 

The libraries were arrayed into 384-well microtiter plates and spotted on high density 
nylon membranes for hybridization analysis. Filters and clones are available through the 
Resource Center. Whole plates were distributed to the sequencing partners of the consortium 
for systematic sequencing. 

EXAMPLE II: Sequencing of cDNA Clones 

All clones in the 384-well microtiter plates were sequenced from the 5* end. 
Sequencing was done preferentially using dye terminator chemistry (ABD or Amersham) on 
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ABI automated DNA sequencers (ABI 377, Applied Biosystems), one partner used EMBL 
prototype instruments (Arakis) mainly with dye primer chemistry. 



The resulting expressed sequence tag (EST) sequences ("rl ESTs" = sequenced from 
5 '-end) were analysed for: 

a) the lack of identical matches with known genes. 

For this, the EST-sequence was blasted against the cDNA consortiums own 
database and after that against public databases and (with BLASTn and BLASTx against 
EMBL/EMBLNEW and assembled ESTs, please refer to EXAMPLE III: Bioinformatics 
analysis of full length cDNAs, for description and parameter settings). ESTs which were 
identical to known genes in more than 100 bp, with less than 2 mismatches, were excluded 
from further analysis. 

b) the presence of an open reading frame 

Open reading frames (ORFs) were detected with an tool developed by Munich 
Information Center for Protein Sequences (MIPS) called ORF-map. ORF-map visualises 
potential start and stop-codons. If an ORF without a stop codon was detected in a rl-EST, 
the sequence was processed further. 

c) the presence of GC rich sequences 

A script developed by MIPS computed the GC-content of the rl -sequence, which 
should be >40%. Writing similar scripts is within the ordinary skill of one in bioinformatics. 

d) the lack of repeat structures 

Repeats such as Alu, Line or CA-repeats were detected by blasting (BLASTn and 
BLASTx, please refer to EXAMPLE III: Bioinformatics analysis of full length cDNAs, for 
description and parameter settings) against a repeat-database compiled by MIPS. If a repeat 
was present within the r 1 -sequence, the sequence were not processed further. 

Novel clones that met all criteria were identified to the sequencers, who then 
performed 3'-end sequencing of these clones. The resulting 3' ESTs ("si ESTs" = sequenced 
from 3 '-end) were checked for 
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a) the lack of matches with known genes in public databases, and sequences already 
generated by us. 



This was done by blasting against EMBL/EMBLNEW and assembled EST (BLASTn 
and BLASTx, please refer to EXAMPLE III: Bioinformatics analysis of fiill length cDNAs, 
for description and parameter settings). 

b) the presence of polyadenylation signals. 

Again only clones matching the selection criteria were chosen to be sequenced 
completely by the sequencers. Clones were selected after the following criteria: 

A very good ORF had at least one BLASTx match to other proteins. A "good ORF" 
should extend to the 3' end and be longer than -40 codons. If the ORF started in the rl 
sequence, in front of the potential start codon, there should not exist too many competing start 
codons in frame with the ORF start codon and the start should match the Kozak consensus 
ATG. If the EST sequence was to short to decide according to the potential ORF, and there 
were only a few or no start codons in the sequence the GC content of the Sequence should be 
greater than 40%. The rl sequences needed not contain an polyA-tail at the 3' end. In 
addition, the results of the blasting against the assembled human ESTs could help in 
questionable cases to decide whether to stop or to continue. A hit against these ESTs was an 
indication to go further. 

Clones passing the above-described screening were sequenced in full. Sequencing was 
done preferentially using dye terminator chemistry (ABD or Amersham) on ABI automated 
DNA sequencers (ABI 377, Applied Biosystems), one partner used EMBL prototype 
instruments (Arakis) mainly with dye primer chemistry. Primer walking (Strauss et al., 1986, 
Specific-primer-directed DNA sequencing. Anal Biochem. 154, 353-360) was the preferred 
sequencing strategy because of the lower redundancy possible compared to random shotgun 
(Messing, J., Crea, R., Seeburg, H.P. (1981) A system for shotgun DNA sequencing. Nucleic 
Acids Res. 9, 32-39) methods. Walking primers were generally designed using software (e.g. 
Haas, S., Vingron, M, Poustka, A., Wiemann, S. (1998) Primer design in large-scale 
sequencing. Nucleic Acids Res. 26, 3006-3012, Schwager, C, Wiemann, S., Ansorge, W. 
(1995) GeneSkipper: integrated software environment for DNA sequence assembly and 
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alignment. HUGO Genome Digest 2, 8-9) that permitted complete automation of this usually 
time consuming process and helped in the parallel processing of large numbers of clones. 



EXAMPLE III: Bioinformatics analysis of full length cDNAs 

Each sequence obtained was compared on nucleotide level in a stepwise manner to 
sequences in EMBL/EMBLNEW, EMBL-EST, EMBL-STS using the BLASTn algorithm. 
Basic Local Alignment Search Tool (BLAST, Altschul S. F. (1993) J Mol Evol 36:290-300; 
Altschul, S. F. et al (1990) J Mol Biol 215:403-10) is used to search for local sequence 
alignments. BLAST produces alignments of both nucleotide (BLASTn) and amino acid 
sequences (BLASTp or BLASTx) to determine sequence similarity. BLAST is especially 
useful in determining exact matches or in identifying homologs, because of the local nature of 
the alignments. While it is useful for matches which do not contain gaps, it is inappropriate 
for performing motif-style searching. The fundamental unit of BLAST algorithm output is the 
High-scoring Segment Pair (HSP). 

An HSP consists of two sequence fragments of arbitrary but equal lengths whose 
alignment is locally maximal and for which the alignment BLAST approach is to look 
threshold or cut off score set by the user. BLAST looks for HSPs between a query sequence 
and a database sequence, to evaluate the statistical significance of any matches found, and to 
report only those matches which satisfy the user-selected threshold of significance. The 
parameter E establishes the statistically significant threshold for reporting database sequence 
matches. E is interpreted as the upper bound of the expected frequency of chance occurrence 
of an HSP (or set of HSPs) within the context of the entire database search. Any database 
sequence whose match satisfies E is reported in the program output. Parameter settings for 
the BL AST-operations (BLASTN 2.0al9MP-WashU) described were: EMBL-EMBLNEW: 
H=0 V=5 B=5 -filter seg; EMBL-EST: H=0 E=le-10 B=500 V=500 -filter seg; EMBL-STS: 
H=0 V=5 B=5. 

Search against EMBL/EMBLNEW was done to determine whether the cDNAs are 
already known, and also to find out whether the cDNAs are encoded by genomic sequences 
already sequenced and published/submitted to these databases. 



125 



WO 01/12659 PCT/IB00/01496 
Search against EMBL-EST was performed to get a first impression how abundant a 
particular cDNA would be and to get information on tissue specificity (so-called "electronic 
Northern-Blot", e.g. some of the cDNAs derived of the testis library show only hits to ESTs 
also derived of testis libraries). 

The cDNA-sequences were blasted against EMBL-STS to determine STS-sequence- 
match to the cDNA, thus providing a mapping information to the new cDNA. 

The potential protein-sequences were generated automatically by a script searching 
for the longest open reading frame (ORF) in each of the three forward frames with a 
minimum length of 90 codons. Next, the automatically generated ORFs were translated into 
protein sequences. These protein sequences were searched against the non redundant protein 
data set of PIR/SwissProt/Trembel/Tremblnew (BLASTP 2.0al9MP-WashU, parameter 
setting: V=7 B=7 H=0 -filter seg). If the script generated more than one ORF, one ORF was 
chosen manually by the annotater according to the degree of similarity to known proteins, the 
location of the ORF in the cDNA, the length, the amino acid composition and the content of 
Prosite-Motifs. 

Additionally there was a BLASTx (BLASTX 2.0al9MP-WashU against non 
redundant protein database comprising PIR/SWISSPROT/TREMBL/TREMBLNEW; 
parameter-settings were: matrix/home/data/blast/matrix/aa/BLOSUM62 H=0 V=5 B=5 -filter 
seg) search to find potential frame shift in the complementary cds of the cDNAs and to 
identify unspliced or partly spliced cDNAs. The protein sequence was then transferred to the 
PEDANT system, in order to generate additional information on the new proteins. PEDANT 
(Protein Extraction, Description, and ANalysis Tool, Frishman, D. & Mewes, H.-W. (1997) 
PEDANTic genome analysis. Trends in Genetics , 13, 415-416) is a platform developed at the 
Munich Information Center for Protein Sequences (MIPS, Munich, Germany), which 
incorporates practically all bioinformatics methods important for the functional and structural 
characterisation of protein sequences. Computational methods used by PEDANT are: 
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Very sensitive protein sequence database searches with estimates of statistical 
significance. Pearson W.R. (1990) Rapid and sensitive sequence comparison with FASTP 
and FASTA. Methods Enzymol. 183, 63-98. 

BLAST2 

Very sensitive protein sequence database searches with estimates of statistical 
significance. Altschul S.F., Gish W., Miller W., Myers E. W., and Lipman D.J. Basic local 
alignment search tool. Journal of Molecular Biology 215, 403-10. 

PREDATOR 

High-accuracy secondary structure prediction from single and multiple sequences. 
Frishman, D. and Argos, P. (1997) 75% accuracy in protein secondary structure prediction. 
Proteins, 27, 329-335. Frishman, D. and Argos, P.(1996) Incorporation of long-distance 
interactions in a secondary structure prediction algorithm. Prot. Eng. 9, 133-142. 

STRIDE 

Secondary structure assignment from atomic coordinates. Frishman, D. and Argos, 
P.(1995) Knowledge-based secondary structure assignment. Proteins 23, 566-579. 

CLUSTALW 

Multiple sequence alignment. Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) 
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through 
sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids 
Research, 22:4673-4680. 

TMAP 

Transmembrane region prediction from multiply aligned sequences. Persson, B. and 
Argos, P. (1994) Prediction of transmembrane segments in proteins utilising multiple 
sequence alignments. J. Mol. Biol. 237, 182-192. 
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Transmembrane region prediction from single sequences. Klein, P., Kanehisa, M, 
and DeLisi, C. Prediction of protein function from sequence properties: A discriminant 
analysis of a database. Biochim. Biophys. Acta 787, 221-226 (1984). Version 2 by Dr. K. 
Nakai. 

SIGNALP 

Signal peptide prediction Nielsen, H., Engelbrecht, J., Brunak, S., and von Heijne, G 
(1997). Identification of prokaryotic and eukaryotic signal peptides and prediction of their 
cleavage sites. Protein Engineering 10, 1-6. 

SEG 

Detection of low complexity regions in protein sequences. Wootton, J.C., Federhen, 
S. (1993) Statistics of local complexity in amino acid sequences and sequence databases. 
Computers & Chemistry 17, 149-163. 

COILS 

Detection of coiled coils. Lupas, A., M. Van Dyke, and J. Stock, "Predicting Coiled 
Coils from Protein Sequences." Science (1991) 252, 1 162-1 164. 

PROSEARCH 

Detection of PROSITE protein sequence patterns. Kolakowski L.F. Jr., Leunissen 
J. A.M., Smith J.E. (1992) ProSearch: fast searching of protein sequences with regular 
expression patterns related to protein structure and function. Biotechniques 13, 919-921. 

BLIMPS 

Similarity searches against a database of ungapped blocks. J.C. Wallace and Henikoff 
S., (1992) PATMAT: a searching and extraction program for sequence, pattern and block 
queries and databases, CABIOS 8, 249-254. Written by Bill Alford. 

HMMER 
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Hidden Markov model software . Sonnhammer E.L.L., Eddy S.R., Durbin R. (1997) 
Pfam: A Comprehensive Database of Protein Families Based on Seed Alignments. Proteins 
28, 405-420. 

pi 

Perl script that returns the amino acid composition, molecular weight, theoretical pi, and 
expected extinction coefficient of an amino acid sequence. By Fred Lindberg. The 
parameter-settings were as follows: known3d: score > 100; BLAST: E-value < 10; SCOP: <= 
50 Alignments, E-Value < 0.0001; signalp: Y=0.7; untersucht vom N-Terminus her: 50 aa; 
funcat: E-value < 0.001; BLOCKS: <= 10 hits; BLIMPS: threshold 1 100.0; COILS: threshold 
0.95; SEG: threshold 20.0; BLAST in report: E-value < 0.001; PIR-KW, superfamilies, EC- 
Nummern in report: E-value < 0.00001; known3d in report: score > 120 

The results of PEDANT analysis, together with the results of the similarity searches, 
constitute the basis for the structural and functional annotation of the cDNAs and the encoded 
proteins, as specified below. 



EXAMPLE III: CELLULAR LOCALIZATIONS OF GFP-FUSION PROTEINS 

Plasmids of cDNA-GFP fusions were transfected into mammalian tissue culture cells 
and allowed to express the proteins for up to 48 hours. Live cells were imaged at 24 hours 
and 48 hours after transfection and the localisations recorded. The chart, below, depicts the 
apparent final cellular localisations of 107 cDNA-GFP fusions. 

In order to minimize the possibility of the GFP interfering with protein function 
and/or localization, two separate populations of cDNAs were generated encoding N-terminal 
or C-terminal GFP fusions. Clearly this appears to be a crucial strategy, since overall only 
56% of the proteins localised to a specific compartment irrespective of the position of the 
GFP. In the instances where only one fusion localized, the complementary fusion either gave 
no expression or a nuclear and cytosolic staining - characteristic for GFP alone expression. 

Each cDNA in turn was subjected to bioinformatic analysis. Where possible, the 
potential subcellular localisations of the expressed proteins were determined. This 
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information was then compared to the actual localisations determined from expression of the 
GFP-fusion proteins in mammalian cells. 
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group; Cell structure and motility 

DKFZphfbr2_16cl6 .3 encodes a novel 586 amino acid 1 protein with similarity to the human actin 
binding protein MAYVEN and Drosophila Kelch. 

MAVEN is a novel actin binding protein predominantly expressed in brain. Drosophila kelch is 
involved in the maintenance of ring canal organization during oogenesis. The amino half of the 
protein including the BTB domain mediates dimerization, while the amino half might allow 
cross- linking of ring canal actin filaments , thus organising the inner rim cytoskeleton. The 
kelch repeat domain is necessary for ring canal localisation and believed to mediate an 
additional interaction, possibly with actin. The new protein shares the features of both 
proteins and therefore should be involved in the organisation of cyto skeleton binding to 
membrane proteins . 

The new protein can find application in modulating/blocking of cyto skeleton- membrane protein 
interaction. 



similarity to Drosophila kelch 
complete cDNA, complete cds, EST hits 

on genomic level partly encoded by AC005082 and AC00603 9 



Sequenced by Qiagen 
Locus: unknown 



Insert length: 3028 bp 

Poly A stretch at pos. 3004, polyadenylation signal at pos. 2984 



1 GGGGGCCCGG GGACGCAGCC CAGTTGGTAG CGTCGCTCCC TGAGCGTTTC 
51 TAAGGGGGCC GCCCGGCCCT GTCTTTCGGC AGTGGCCGAG CCACCGCCGC 
101 CTGCCGCGCG TTCCAGAGCT GGGCGCTGCA GCTGCACTGC CGATCGCCGT 
151 GTTTGGTCGA TAGAATCCCC AGTGTGCCCA GAGAGTGCGA CCCCTCGCCC 
201 GGCCCGGCGA GCCCCGGGCG TGAACCGAGC TGAGGGAGGA TGGCAGCCTC 
251 TGGGGTGGAG AAGAGCAGCA AGAAGAAGAC CGAGAAGAAA CTTGCTGCTC 
301 GGGAAGAAGC TAAATTGTTG GCGGGTTTCA TGGGCGTCAT GAATAACATG 
351 CGGAAACAGA AAACGTTGTG TGACGTGATC CTCATGGTCC AGGAAAGAAA 
401 GATACCTGCT CATCGTGTTG TTCTTGCTGC AGCCAGTCAT TTTTTTAACT 
451 TAATGTTCAC AACTAACATG CTTGAATCAA AGTCCTTTGA AGTAGAACTC 
501 AAAGATGCTG AACCTGATAT TATTGAACAA CTGGTGGAAT TTGCTTATAC 
551 TGCTAGAATT TCCGTGAATA GCAACAATGT TCAGTCTTTG TTGGATGCAG 
601 CAAACCAATA TCAGATTGAA CCTGTGAAGA AAATGTGTGT TGATTTTTTG 
651 AAAGAACAAG TTGATGCTTC AAATTGTCTT GGTATAAGTG TGCTAGCGGA 
701 GTGTCTAGAT TGTCCTGAAT TGAAAGCAAC TGCAGATGAC TTTATTCATC 
751 AGCACTTTAC TGAAGTTTAC AAAACTGATG AATTTCTTCA ACTTGATGTC 
801 AAGCGAGTAA CACATCTTCT CAACCAGGAC ACTCTGACTG TGAGAGCAGA 
851 GGATCAGGTT TATGATGCTG CAGTCAGGTG GTTGAAATAC GATGAGCCTA 
901 ATCGCCAGCC ATTTATGGTT GATATCCTTG CTAAAGTCAG GTTTCCTCTT 
951 ATATCAAAGA ATTTCTTAAG TAAAACGGTA CAAGCTGAAC CACTTATTCA 
1001 AGACAATCCT GAATGCCTTA AGATGGTGAT AAGTGGAATG AGGTACCATC 
1051 TACTGTCTCC AGAGGACCGA GAAGAACTTG TAGATGGCAC AAGACCTAGA 
1101 AGAAAGAAAC ATGACTACCG CATAGCCCTA TTTGGAGGCT CTCAACCACA 
1151 GTCTTGTAGA TATTTTAACC CAAAGGATTA TAGCTGGACA GACATCCGCT 
1201 GCCCCTTTGA AAAACGAAGA GATGCAGCAT GCGTGTTTTG GGACAATGTA 
1251 GTATACATTT TGGGAGGCTC TCAGCTTTTC CCAATAAAGC GAATGGACTG 
13 01 CTATAATGTA GTGAAGGATA GCTGGTATTC GAAACTGGGT CCTCCGACAC 
13 51 CTCGAGACAG CCTTGCTGCA TGTGCTGCAG AAGGCAAAAT TTATACATCT 
1401 GGAGGTTCAG AAGTAGGAAA CTCAGCTCTG TATTTATTTG AGTGCTATGA 
1451 TACGAGAACT GAAAGCTGGC ACACAAAGCC CAGCATGCTG ACCCAGCGCT 
1501 GCAGCCATGG GATGGTGGAA GCCAATGGCC TAATCTATGT TTGTGGTGGA 
1551 AGTTTAGGAA ACAATGTTTC AGGGAGAGTG CTTAATTCCT GTGAAGTTTA 
1601 TGATCCTGCC ACAGAAACAT GGACTGAGCT GTGTCCAATG ATTGAAGCCA 
1651 GGAAGAATCA TGGGCTGGTA TTTGTAAAAG ACAAGATATT TGCTGTGGGT 
1701 GGTCAGAATG GTTTAGGTGG TCTGGACAAT GTGGAATATT ACGATATTAA 
1751 GTTGAACGAA TGGAAGATGG TCTCACCAAT GCCATGGAAG GGTGTAACAG 
1801 TGAAATGTGC AGCAGTTGGC TCTATAGTTT ATGTCTTGGC TGGTTTTCAG 
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2301 AGAAGATTGG CTCATCAGTG 

2351 TTCATGCATC ACAGAAGTGC 

2401 AGAACTAAGA AATAGTATGA 

2451 AGCAGCTTAG TCTCACAGTT 

2501 ATGTATTCCA TTTTAAAAGT 

2551 TTTCTCACAA AACTTCCTAA 

2601 CTCATGAAAT ATATTCAATC 

2651 AAATGTAAAG CTTAGCACCC 

2701 TGGTTAAAAA AGGATTCTGC 

2751 AATCATAGTA AGTGATTAAC 

2801 TTCTAGATCA TTAGAAAAGC 

2851 TTGAACTTCT TTAACGAGAT 

2901 AAATATTTAA CCTAGTTGTC 

2951 TATGTCATCT CCTATTCATT 

3001 ACCCAAAAGA AAAAAAAAAA 



AAGCGCAGTA TCTTAGCTCT AGATTCTATT 
TATACGGTTA GGTCTGTTTG TGCTCAGTCA 
ATTGTAAGTC AAGATGGGCA ACTCAGATGG 
TGCTTGTCTA TTTATTTTAT TTAGTGCCAA 
AAGCCAGAGT GAGTCAAGGC ATATACACAC 
ACAGATTTGG GGGTTTAATA TGTCCAACTC 
CACTTAAATA TATTCCATCT TTTTAACATA 
ATCATTAATT TATGTCTCTG TTTTATCCAG 
CTCTTTAGTC CTCACTGTTA AATAAAACCC 
TAGCAAAAAG TAAAGCTATT TATAGCAAAT 
ACTGGTAGTT GTACAATATC AGTGTTGACT 
CATGAATTCT TTTCCCTTAG CCAAAACATG 
TCTAAAAGTT TTGTAATCAT GAGTTAGATA 
GCTTTTATGT GATCAATAAA TCTTTTACAA 
AAAAAAAA 



BLAST Results 



Entry AC005082 from database EMBL: 

Homo sapiens clone RG271G13; HTGS phase 1, 7 unordered pieces. 
Score = 6460, P = 0.0e+00, identities - 1292/1292 

4 exons matching Bp 1180-3007 

Entry AC006039 from database EMBL: 

*** SEQUENCING IN PROGRESS *** Homo sapiens clone NH0319F03; HTGS phase 
1, 3 unordered pieces. 

Score = 1780, P = 2.0e-117, identities = 368/377 

5 exons matching Bp 6-860 

Entry HSG20603 from database EMBL: 
human STS A005Y34. 
Score « 670, P - 1.0e-23, identities = 134/134 



Medline entries 



93201592: 

kelch encodes a component of intercellular bridges in 
Drosophila egg chambers . 

97412177: 

Drosophila kelch is an oligomeric ring canal actin organizer. 



Peptide information for frame 3 



ORF from 240 bp to 1997 bp; peptide length: 586 
Category: strong similarity to known protein 



1 MAASGVEKSS KKKTEKKLAA REEAKLLAGF MGVMNNMRKQ KTLCDVILMV 
51 QERKIPAHRV VLAAASHFFN LMFTTNMLES KSFEVELKDA EPDIIEQLVE 
101 FAYTARISVN SNNVQSLLDA ANQYQIEPVK KMCVDFLKEQ VDASNCLGIS 
151 VLAECLDCPE LKATADDFIH QHFTEVYKTD EFLQLDVKRV THLLNQDTLT 
201 VRAEDQVYDA AVRWLKYDEP NRQPFMVDIL AKVRFPLISK NFLSKTVQAE 
251 PLIQDNPECL KMVISGMRYH LLSPEDREEL VDGTRPRRKK HDYRIALFGG 
301 SQPQSCRYFN PKDYSWTDIR CPFEKRRDAA CVFWDNWYI LGGSQLFPIK 
351 RMDCYNWKD SWYSKLGPPT PRDSLAACAA EGKIYTSGGS EVGNSALYLF 
401 ECYDTRTESW HTKPSMLTQR CSHGMVEANG LIYVCGGSLG NNVSGRVLNS 
451 CEVYDPATET WTELCPMIEA RKNHGLVFVK DKIFAVGGQN GLGGLDNVEY 
501 YDIKLNEWKM VSPMPWKGVT VKCAAVGSIV YVLAGFQGVG RLGHILEYNT 
551 ETDKWVANSK VRAFPVTSCL ICVVDTCGAN EETLET 

BLASTP hits 

Entry KELC_ DROME from database SWISSPROT: 
RING CANAL PROTEIN (KELCH PROTEIN) . 
Length = 689 

Score - 816 (287.2 bits), Expect « 1.9e-81, P = 1.9e-81 
Identities = 187/542 (34%), Positives « 290/542 (53%) 

Entry AC004021 1 from database TREMBL: 

WUGSC:H_DJ0186K10.1"; Human PAC clone DJ0186K10 from 5q31, 
complete sequence. Homo sapiens (human) 
Length = 497 
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Score = 704 (247.8 bits), Expect » 1.4e-69, P = l.4e-69 
Identities = 163/483 (33%), Positives - 253/483 (52%) 

Entry HSDKG12_1 from database TREMBL: 

"KIAA0132"; Human mRNA for KIAA0132 gene, complete cds. Homo 
sapiens (human) 
Length - 624 

Score = 692 (243,6 bits), Expect = 2.6e-68, P - 2.6e-68 
Identities » 175/527 (33%), Positives » 272/527 (51%) 

Entry A45773 from database PIR: 

kelch protein, long form - fruit fly (Drosophila melanogaster ) 
Length « 1476 

Score - 817 (287.6 bits), Expect - 1.7e-80, P = 1.7e-80 
Identities = 189/549 (34%), Positives = 292/549 (53%) 



Alert BLAST P hits for DKFZphfbr2_16cl6, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_16cl6, frame 3 



Report for DKFZphfbr2_16cl6. 3 



(LENGTH] 


586 




[MWJ 


65992.06 




tpl] 


6.08 




[HOMOL] 


PIR:A45773 kelch protein, long form - fruit fly 


(Drosophila melanogaster) 5e-85 


(BLOCKS] 


BL00075D Dihydrofolate reductase proteins 




(SCOP) 


dlgog_3 2.46.1.1.1 (151-537) Galactose oxidase, 


central domai 6e-36 


(PIRKW] 


zinc finger 2e-ll 




(PIRKW] 


DNA binding 9e-10 




[PIRKW] 


transcription factor le-06 




[SUPFAM] 


A55R protein middle region homology le-35 




[SUPFAM] 


POZ domain homology le-35 




[SUPFAM] 


vaccinia virus 59K Hindlll-C protein 5e-15 




[ SUPFAM] 


A55R protein le-35 




(SUPFAM] 


myxoma virus M9-R protein 2e-ll 




[SUPFAM] 


A55R protein c a rboxyl- terminal homology le-35 




(PROSITE] 


CAMP PHOSPHO SITE 2 




[PROSITE] 


MYRISTYL 8 




[PROSITE] 


CK2 PHOSPHO SITE 10 




[PROSITE] 


TYR~PHOSPHO~SITE 1 




[PROSITE] 


PKC PHOSPHO SITE 11 




[PROSITE] 


ASN_GLYCOSYLATION 1 




[KW] 


Alpha Beta 




[KW] 


LOW_COMPLEXITY 3.75 % 





SEQ MAAS G VE K S S K K KTEKK LAA REE AKL L AG FMG VMNNM RKQK T LC DV I LMV QE RK I PAH RV 

SEG xxxxxxxxxxxxxxxxxxxxxx 

PRD . ccceeeeccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeeccccchhhhhe 

SEQ VLAAASHFFNLMFTTNMLESKSFEVELKDAEPDIIEQLVEFAYTARISVNSNNVQSLLDA 

SEG 

PRD eeccccccccccccccchhhhhheeeeccccchhhhhhhhhhhhheeeeccchhhhhhhh 

SEQ ANQYQIEPVKKMCVDFLKEQVDASNCLGISVLAECLDCPELKATADDFIHQHFTEVYKTD 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ EFLQLDVKRVTHLLNQDTLTVRAEDQVYDAAVRWLKYDEPNRQPFMVDI LAKVRFPLI SK 

SEG 

PRD hhhchhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhccch 

SEQ NFLSKTVQAEPLIQDNPECLKMVISGMRYHLLSPEDREELVDGTRPRRKKHDYRIALFGG 

SEG 

PRD hhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccccccccccccceeeeeeecc 

SEQ SQPQSCRYFNPKDYSWTDIRCPFEKRRDAACVFWDNVVYILGGSQLFPIKRMDCYNVVKD 

SEG 

PRD ccccceeeccccccccccccccccccceeeeeeeceeeeeeccccccccceeeecccccc 

SEQ SWYSKLGPPTPRDSLAACAAEGKIYTSGGSEVGNSALYLFECYDTRTESWHTKPSMLTQR 

SEG 

PRD cccccccccccccceeeeeccceeeeeccccccccceeeeeecccccccccccccccccc 
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SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



CSHGMVEANGLIYVCGGSLGNNVSGRVLNSCEVYDPATETWTELCPMIEARKNHGLVFVK 

ccceeeecceeeeeecccccccccccccceeeeccccccccccccccccccccceeeeec 

DKIFAVGGQNGLGGLDNVEYYDIKLNEWKMVSPMPWKGVTVKCAAVGSIVYVLAGFQGVG 

ceeeecccccccccccceeeccccccceeecccccccccceeeeeccceeeeeccccccc 

RLGHILEYNTETDKWVANSKVRAFPVTSCLICVVDTCGANEETLET 

cccceeecccccccccccccccccccceeeeeeeeccccccccccc 



Prosite for DKFZphfbr2_16cl6. 3 



PS00001 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS000O6 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS0O0O8 
PS0O008 
PS00008 
PS00008 
PS00008 
PS0O0O8 
PS00008 
PS00008 



442->446 
11->15 
188->192 
9->12 
10->13 
14->17 
104->107 
200->203 
305->308 
370->373 
418->421 
444->447 
520->523 
552->555 
4->8 
42->46 
116->120 
164->168 
273->277 
315->319 
370->374 
405->409 
460->464 
550->554 
202->209 
5->ll 
32->38 
389->395 
424->430 
436->442 
440->446 
487->493 
493->499 



AS N_GL YCOS Y LAT ION 

CAMP_PHOSPHO SITE 

CAMP_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC0O005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphfbr2_16cl6 . 3) 
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group: brain derived 

DKFZphfbr2_16f21 encodes a novel 208 amino acid protein with strong similarity to human zinc 
finger protein 216. 

The novel protein shows strong similarity to the human zinc finger protein 216, but has no Zn 
finger. 

PROSITE: Contains no Zinc finger; No informative BLAST results; no predictive prosite, pfam or 
SCOP motif e 

The new protein can find application in studying the expression profile of brain-specific 
genes. 



strong similarity to zinc finger protein 216 

complete cDNA, complete cds, EST hits 
start matches Kozak consensus ANNatgG, 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1512 bp 

Poly A stretch at pos. 1490, polyadenylation signal at pos. 1474 



1 GGGAGCAAGC AGGGGTTCGG CGGCATTACC TGTACCCATT CACCGGCGGC 
51 TACCGGCGGC GGCGCGTAGC GTGTCAGGCG GAGAGACCCG CCGCCAGGTG 
101 TGCAACTGAG GAACATGGCT CAAGAAACTA ATCACAGCCA AGTGCCTATG 
151 CTTTGTTCCA CTGGCTGTGG ATTTTATGGA AACCCTCGTA CAAATGGCAT 
201 GTGTTCAGTA TGCTATAAAG AACATCTTCA AAGACAGAAT AGTAGTAATG 
251 GTAGAATAAG CCCACCTGCA ACCTCTGTCA GTAGTCTGTC TGAATCTTTA 
301 CCAGTTCAAT GCACAGATGG CAGTGTGCCA GAAGCCCAGT CAGCATTAGA 
351 CTCTACATCT TCATCTATGC AGCCCAGCCC TGTATCAAAT CA'GTCACTTT 
401 TATCAGAATC TGTAGCATCT TCTCAATTGG ACAGTACATC TGTGGACAAA 
451 GCAGTACCTG AAACAGAAGA TGTGCAGGCT TCAGTATCAG ACACAGCACA 
501 GCAGCCATCT GAAGAGCAAA GCAAGCCTCT TGAAAAACCG AAACAAAAAA 
551 AGAATCGCTG TTTCATGTGC AGGAAGAAAG TGGGACTTAC TGGGTTTGAA 
601 TGCCGGTGTG GAAATGTTTA CTGTGGTGTA CACCGTTACT CAGATGTACT 
651 CAATTGCTCT TACAATTACA AAGCCGATGC TGCTGAGAAA ATCAGAAAAG 
701 AAAATCCAGT AGTTGTTGGT GAAAAGATCC AAAAGATTTG AACTCCTGCT 
751 GGAATACAAA ATTCTTGAGC ATCTGCAAAC TAAAAATTGA CTTGAGGTTT 
801 TTTTTTTCCT AGTCATTGGG AATGTAGAGC AGTGTATCTT GCATGTCATC 
851 GGAAGAATAG ATTTTTGTTT TGGTTTTGTT TTGAAAATGA CTCTGAACAT 
901 TTATTTCCAT TGCAATTTCT GTGGCTGAGG AGACTTAAAC TTTACAAGTA 
951 TTATCCTTTT AAGATCATTT TAATTTTAGT TGAGTGCAGA GGGCTTTTAT 
1001 AACAAACGTG CAGAAATTTT GGAGGGCTGT GATTTTTCCA GTATTAAACA 
1051 TGCATGCATT AATCTTGCAG TTTATTTTCT CATTATGTAT GTATATATCG 
1101 CTTTTCTCTG CAGCACGATT TCTCTTTTGA TAATGCCCTT TAGGGCACAA 
1151 CTAGTTATCA GTAACTGAAT GTATCTTAAT CATTATGGCT GCTTCTGTTT 
1201 TTTCATTAAC AAAGGTTATT CATATGTTAG CATATAGTTT CTTTGCACCC 
1251 ACTATTTATG TCTGAATCAT TTGTCACAAG AGAGTGTGTG CTGATGAGAT 
1301 TGTAAGTTTG TGTGTTTAAA CTTTTTTTTG AGCGAGGGAA GAAAAAGCTG 
1351 TATGCATTTC ATTGCTGTCT ACAGGTTTCT TTCAGATTAT GTTCATGGGT 
1401 TTGTGTGTAT ACAATATGAA GAATGATCTG AAGTAATTGT GCTGTATTTA 
1451 TGTTTATTCA CCAGTCTTTG ATTAAATAAA AAGGAAAACC AGAAAAAAAA 
1501 AAAAAAAAAA AA 



BLAST Results 



NO BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 
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ORF from 115 bp to 738 bp; peptide length: 208 
Category: strong similarity to known protein 



1 MAQETNHSQV PMLCSTGCGF YGNPRTNGMC SVCYKEHLQR QNSSNGRISP 
51 PATSVSSLSE SLPVQCTDGS VPEAQSALDS TSSSMQPSPV SNQSLLSESV 
101 ASSQLDSTSV DKAVPETEDV QASVSDTAQQ PSEEQSKPLE KPKQKKNRCF 
151 MCRKKVGLTG FECRCGNVYC GVHRYSDVLN CSYNYKADAA EKIRKENPW 
201 VGEKIQKI 

BLASTP hits 
Entry ATF7H19_1 from database TREMBLNEW: 

gene: "F7H19 . 10"; product: "putative protein"; Arabidopsis thaliana DNA 
chromosome 4, BAC clone F7H19 (ESSAII project) >TREMBL ; ATT 1 2H 1 7_2 1 
gene: "T12H17.210"; product: "predicted protein"; Arabidopsis thaliana 
DNA chromosome 4, BAC clone T12H17 (ESSAII project) 
Score - 206, P - 2.1e-24, identities - 51/146, positives = 77/146 

Entry PVPVPR3A_1 from database TREMBL: 

gene: "PVPR3" ; P. vulgaris PVPR3 protein mRNA, complete cds. 
Score » 237, P = 4.9e-20, identities = 50/136, positives = 73/136 

Entry AF062072_1 from database TREMBL: 

gene: "ZNF216"; product: "zinc finger protein 216"; Homo sapiens zinc 
finger protein 216 (ZNF216) gene, complete cds. 

Score = 591, P = 1.6e-57, identities « 124/215, positives » 147/215 



Alert BLASTP hits for DKFZphfbr2_16f 21, frame 1 

TREMBL :AF0 620 7 1_1 product: "zinc finger protein ZNF216"; Mus musculus 
zinc finger protein ZNF216 mRNA, complete cds., N = 1, Score = 590, P = 
2.1e-57 

TREMBLNEW :AB001773_1 gene: n pem-6"; product: "PEM-6"; Ciona savignyi 
pem-6 (posterior end mark 6) mRNA, complete cds., N = 1, Score « 421, P 
- 1.7e-39 



>TREMBL : AFO 6207 1_1 product: "zinc finger protein ZNF216"; Mus musculus zinc 
finger protein ZNF216 mRNA, complete cds. 
Length - 213 

HSPs: 



Score = 590 (88.5 bits), Expect = 2.1e-57, P = 2.1e-57 
Identities « 123/213 (57%), Positives = 146/213 (68%) 



Query: 


1 


MAQETNHSQVPMLCSTGCGFYGNPRTNGMCSVCYKEHLQRQNSSNGRISPPAT SVSS 


57 






MAQETN + PMLCSTGCGFYGNPRTNGMCSVCYKEHLQRQ +S GR+SP T S S 




Sbjct: 


1 


MAQETNQTPGPMLCSTGCGFYGNPRTNGMCSVCYKEHLQRQQNS-GRMSPMGTASGSNSP 


59 


Query: 


58 


LSESLPVQCTDGSVPEAQSALDSTSSSMQPSPVSNQSLLSE — SVASSQLDSTSVDKAVP 


115 






S+S VQ D + + A STS + PV+ + + ++ S+ D + K 




Sbjct: 


60 


TSDSASVQRADAGLNNCEGAAGSTSEKSRNVPVAALPVTQQMTEMSISREDKITTPKT-E 


118 


Query: 


116 


ETEDVQASVSDTAQQPSEEQS— KPLEKPKQKKNRCFMCRKKVGLTGFECRCGNVYCGVH 


173 






+E V S + QPS QS K E PK KKNRCFMCRKKVGLTGF+CRCGN++CG+H 




Sbjct: 


119 


VSEPVVTQPSPSVSQPSSSQSEEKAPELPKPKKNRCFMCRKKVGLTGFDCRCGNLFCGLH 


178 


Query: 


174 


RYSDVLNCSYNYKADAAEKIRKENPVVVGEKIQKI 208 








RYSD NC Y+YKA+AA KIRKENPVVV EKIQ+I 




Sbjct: 


179 


RYSDKHNCPYDYKAEAAAKIRKENPVWAEKIQRI 213 





Pedant information for DKFZphfbr2_16f 21, frame 1 



Report for DKFZphfbr2_16f21 . 1 



[LENGTH] 208 

(MWJ 22541.23 

[pi] 6.80 

[HOMOLJ TREMBL: AF062072_1 gene: "ZNF216"; product: "zinc finger protein 216"; Homo 
sapiens zinc finger protein 216 (ZNF216) gene, complete cds. 9e-57 

[PIRKW] zinc 8e-13 

[PIRKW] zinc finger 8e-13 
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[PIRKW) 

[SUPFAM] 

tSUPFAM) 

[PROSITEJ 

[PROSITEJ 

[PROSITEJ 

tKW] 

[KW] 



fusion protein 8e-13 

unassigned ubiquitin-related proteins 8e-13 
ubiquitin homology 8e-13 
MYRISTYL 2 

7 
4 



CK2_PHOSPHO_SITE 
ASN_GLYCOSYLATION 
irregular 
LOW COMPLEXITY 



7.21 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MAQETNHSQVPMLCSTGCGFYGNPRTNGMCSVCYKEHLQRQNSSNGRISPPATSVSSLSE 

ccccccccccccccccccccccccccccccchhhhhhhhhhccccccccccccccccccc 

SLPVQCTDGSVPEAQSALDSTSSSMQPSPVSNQSLLSESVASSQLDSTSVDKAVPETEDV 

xxxxxxxxxxxxxxx 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

QASVSDTAQQPSEEQSKPLEKPKQKKNRCFMCRKKVGLTGFECRCGNVYCGVHRYSDVLN 

cccccccccccccccccccccccccccceeecccccccceeecccccccccccccccccc 

CSYNYKADAAEKI RKENPWVGEKIQKI 

ccchhhhhhhhhhhhhcccccccccccc 



Prosite for DKFZphfbr2_16f21 . 1 



PS00001 
PS00001 
PS00001 
PS00001 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 



6->10 
42->46 
92->96 
180->184 
57->61 
70->74 
76->80 
103->107 
108->112 
123->127 
159->163 
22->28 
166->172 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOS PHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 



{No Pfam data available for DKFZphfbr2_16f 21 . 1 J 
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DKFZphfbr2_16gl8 



group: cell cycle 

DKFZphfbr2_16gl8 .3 encodes a novel 984 amino acid protein with similarity to centromeric 
proteins of yeasts. 

The novel protein shows similarity to S. pombe SPAC17A5.07c and the S. cerevisiae Smt4p 
suppressor of MIF2 gene. MIF2 encodes a centromeric protein with homology to the mammalian 
centromeric protein CENP-C. Mutations in MIF2 stabilise dicentric minichromosomes and confer 
high instability to chromosomes that bear a ci3-acting mutation in element I of the yeast 
centromeric DNA (CDEI). Therefore the new protein should be involved in centromer 
organisation, too. 

The new protein can find application in modulating/blocking the cell cycle and influencing the 
behavior of chromosomes, both natural and artificial in eukaryotic cells. 



similarity to KIAA0797 and yeast Smt4p 
complete cDNA, complete cds, EST hits 

the yeast Smt4 protein seems to be involved in centromer function 
and microtuble organisation 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 4826 bp 

Poly A stretch at pos. 4756, polyadenylation signal at pos. 4736 



1 GGGTCGAGGT CGACGGTATC GATAAGTTTT TTTTTTTTTT TTTTTTTTTT 
51 TTTTCCTTTC CCCTCCCCCT CCCTCTCCAA GCCGGAGGGG TCCTGAGGTG 
101 ACAGCGCCTG CAACTGAAAT TTCAGCAGCG GGAGAAGATG GACAAGAGAA 
151 AGCTCGGGCG ACGGCCATCT TCATCCGAAA TCATCACAGA AGGAAAAAGG 
201 AAAAAGTCAT CTTCTGATTT ATCGGAGATA AGAAAGATGT TAAATGCAAA 
251 ACCAGAGGAT GTCCATGTTC AATCACCACT GTCCAAATTC AGAAGCTCAG 
301 AACGCTGGAC TCTCCCTTTG CAGTGGGAAA GAAGCCTAAG GAATAAAGTC 
351 ATCTCTCTAG ACCATAAAAA TAAAAAACAT ATCCGAGGGT GTCCTGTTAC 
401 TTCCAGGTCA TCACCAGAAA GGATACCCAG AGTTATATTG ACGAATGTCC 
451 TGGGAACGGA GTTAGGAAGA AAATACATAA GGACCCCACC TGTAACTGAG 
501 GGAAGTTTGA GTGATACAGA CAACTTGCAA TCAGAGCAAC TTTCTTCATC 
551 ATCTGATGGC AGCCTAGAAT CTTATCAAAA TCTAAACCCT CACAAGAGCT 
601 GTTATTTATC TGAAAGGGGC TCACAACGAA GTAAGACAGT AGATGACAAT 
651 TCTGCAAAGC AGACTGCGCA CAATAAAGAA AAACGAAGAA AGGATGATGG 
701 CATTTCTCTT TTAATATCTG ATACTCAGCC TGAAGACCTT AACAGTGGAA 
751 GTAGAGGTTG TGATCATCTC GAACAGGAAA GCAGAAACAA GGATGTTAAA 
801 TATTCTGATT CAAAAGTGGA ACTCACTCTG ATTTCCAGGA AGACAAAGAG 
851 AAGGCTTAGA AATAATTTAC CTGATTCTCA ATATTGTACT TCTTTGGATA 
901 AGTCAACAGA ACAGACAAAA AAACAAGAAG ATGACTCAAC AATATCCACT 
951 GAGTTTGAAA GGCCAAGTGA AAACTATCAT CAGGATCCAA AACTGCCTGA 
1001 AGAAATTACA ACTAAACCTA CAAAAAGTGA TTTTACTAAG CTATCCTCAC 
1051 TTAACAGTCA GGAGTTGACT TTGAGTAATG CCACCAAAAG TGCCTCTGCC 
1101 GGTTCAACCA CTGAAACCGT TGAGTACTCT AATTCCATTG ATATTGTGGG 
1151 GATTTCTTCC CTGGTTGAGA AGGATGAGAA TGAGTTGAAT AC CAT AG AAA 
1201 AGCCTATTCT AAGAGGACAT AATGAAGGGA ACCAATCACT GATCTCAGCT 
1251 GAACCAATTG TTGTTTCCAG TGATGAAGAA GGACCTGTTG AACATAAAAG 
1301 TTCAGAAATT CTTAAGTTAC AATCTAAGCA AGACCGTGAG ACAACTAATG 
1351 AAAATGAGAG TACTTCTGAA TCAGCATTGT TAGAACTACC ATTGATTACA 
1401 TGTGAATCTG TACAGATGTC ATCTGAATTA TGCCCATATA ATCCTGTCAT 
1451 GGAGAACATT TCCAGTATTA TGCCTAGTAA TGAGATGGAT CTACAACTGG 
1501 ATTTTATATT TACTTCTGTT TATATTGGTA AAATAAAAGG AGCTTCTAAA 
1551 GGTTGTGTTA CAATCACAAA AAAATATATT AAGATCCCAT TTCAAGTGTC 
1601 CCTGAATGAG ATTTCATTGC TAGTGGATAC CACACATTTA AAGCGGTTTG 
1651 GGTTATGGAA AAGTAAGGAT GATAATCACA GTAAAAGGAG TCATGCTATT 
1701 CTTTTCTTCT GGGTCTCTTC AGATTATCTT CAAGAGATTC AGACCCAATT 
1751 AGAACACTCT GTATTAAGCC AGCAATCAAA ATCTAGTGAA TTCATTTTCC 
1801 TTGAACTACA CAATCCTGTT TCACAGAGAG AAGAATTGAA GCTGAAAGAT 
1851 ATTATGACGG AAATAAGTAT AATCAGTGGA GAATTAGAGC TTTCTTACCC 
1901 GTTGTCTTGG GTTCAGGCAT TTCCTTTGTT TCAGAACCTC TCTTCAAAAG 
1951 AAAGTTCTTT TATTCATTAT TACTGTGTTT CAACTTGTTC TTTCCCTGCT 
2001 GGTGTTGCTG TTGCTGAAGA AATGAAGCTG AAATCAGTAT CTCAGCCCTC 
2051 AAACACAGAT GCGGCCAAGC CTACTTACAC CTTCCTGCAG AAGCAAAGTA 
2101 GCGGTTGCTA CTCCCTTTCT ATTACATCTA ATCCAGATGA AGAATGGCGG 
2151 GAAGTCAGGC ACACTGGACT TGTTCAGAAG TTGATTGTAT ATCCTCCACC 
2201 ACCTACTAAG GGGGGATTGG GAGTAACTAA TGAAGATCTG GAGTGTTTAG 
2251 AAGAAGGAGA GTTTCTTAAT GATGTAATCA TTGATTTTTA CCTTAAGTAT 
2301 CTTATATTGG AGAAGGCATC AGATGAACTT GTTGAACGAA GTCACATTTT 
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2351 TAGTAGCTTT TTCTATAAAT GCTTGACAAG AAAGGAAAAT AATTTAACAG 
2401 AAGATAATCC AAATCTTTCA ATGGCACAGA GAAGACATAA AAGAGTAAGA 
2451 ACATGGACTC GTCACATAAA CATTTTTAAT AAAGATTACA TCTTTGTACC 
2501 TGTAAATGAG TCGTCTCACT GGTATCTCGC AGTCATTTGT TTTCCATGGT 
2551 TAGAAGAAGC TGTGTATGAA GATTTTCCAC AAACTGTATC CCAGCAGTCC 
2601 CAGGCTCAGC AGTCCCAAAG TGACAACAAA ACAATAGATA ATGATCTACG 
2651 TACTACTTCG ACACTGTCTT TGAGTGCAGA GGATTCCCAA AGTACCGAGT 
2701 CGAATATGTC AGTACCAAAG AAAATGTGTA AAAGGCCATG TATTCTTATA 
2751 CTAGACTCCT TGAAAGCTGC TTCTGTACGA AACACAGTTC AGAATTTACG 
2801 AGAGTATTTA GAGGTAGAGT GGGAAGTTAA ACTAAAAACT CATCGTCAAT 
2851 TCAGCAAAAC AAACATGGTG GATCTATGCC CTAAAGTTCC TAAACAGGAC 
2901 AATAGCAGTG ATTGTGGAGT ATATTTATTG CAGTATGTGG AAAGCTTCTT 
2951 CAAGGATCCT ATTGTTAACT TTGAACTTCC AATTCATTTG GAGAAGTGGT 
3001 TTCCTCGTCA TGTAATAAAG ACCAAACGGG AAGATATTCG AGAGCTCATC 
3051 TTGAAACTTC ATTTACAGCA ACAGAAGGGC AGCAGTAGCT AGTTAATCTG 
3101 TACAAACATG ACACAGATGT TCTCTAAGAT TACTGGAAAG CCCCTTACCA 
3151 GCATTTGTGT TAGCCAGCTC ACAGAGAAGA AAATAACTTG CAGTAGTTTT 
3201 ATAATAAGTC ATTGGAACAT TATTTAAAAT ATGTAGGACA CATTATTAGA 
3251 ATTGTTGGGA TCTCATAGAT GGAATGGGAA TGGGGGTGAT ATAGATAAAC 
3301 TTACTAGATA TAAATTAAAA TTTTATAAAT ATTTCATATT TTTCTGAGTA 
3351 AATATGATTG GATTATGCAA CAGCATATGT AATATGGGAA TGTTTTGTAG 
3401 ATAATAAAAC TTACATGATC TGTACTTCCA CGTGACTGGG TGCTGAGGGG 
3451 AGTTAAAGCC TCCCTGGTGC CAGCCCCAGT GCTTGTCAAA TTTGCTGACA 
3501 GGTCACATCA TATTGTAATT CTATTCTTTG CAGCTCAAGC ATGCAGTATG 
3551 AATACTGTGT ATTTTTTAAA AAAATAATTT AGTATCAAGG CTTCAGAAAA 
3601 TGCCATTTAC GGCATCCCTT CTGTATGTAA CAAAAAGACA TTCATAATGT 
3651 TAGGAAGATG ATAAAAATTC GCTCTTTTAA AGTGCAGCTT ATTATTCTCA 
3701 ATTGCTAAAT ACGATTACTC TGCTTTTTTT TTTTCATTTC TTTTGATGTC 
3751 ATATGTGAGT ATCTTATAAT TTAGTTCATT TGTTCAGGGT AAAATTTGAA 
3801 ACAAAAAATT TTACCTGTGC AAAATAGTTT TTTAAAAATT ATACATGTAG 
3851 CTCAACTTGA GGTACTGCTA TATAAATATT CACTCACATT ATCACGGAAT 
3901 TTATGTATAG TTTCTCTAAT ATAGAAGATA AAATTGGTGT CCTCATAACT 
3951 TTAACAAAGA AAACCCTCAG TCCTATTTAT TAATGGGTAG AATTAAATAT 
4001 ATAATTTTAT AGCTCAGTTT ACCCAGTATT CATC TGC AAA GCCAGATTGC 
4051 TCTCATTGCT TTTATATTTT TAAATTGTAG CTTTTAGAGA CCTATGATCC 
4101 TCATGGAACT TAATTTTTTA TTAAATATTC AGGTAACAGT TCTGAATTCA 
4151 TGTGATAATG GTGGCATTAT ATATGATTAA ACACTTCAGA ACTTTCTAAT 
4201 GTTATCAGGA GTATTTTGAG GGAGATATGA TTATATTGTA TTTTCTCAGA 
4251 TAAGAAAAAT GTTTTTTAAC AATATTATTT TAATCTGTTT TAAGCATCTC 
4301 TTAGATTTAC ATTATAACTA CATAAAGCAG TGAAGCAAAG GCAAATTAAG 
4351 ATAAAGCTAG AAAGTCTGAA CATTTTATTT CAAAATCATA CGAATCGGGG 
4401 TCAGTTAAGC CTCAGTATTC TTAGCTTTTG TTGATTTTGG CACTATCTTT 
4451 ATATTATTAA ATATATTTGT TGTTTGGATA TTTCATATAA AGATGGCTAT 
4501 AATTACATAT TTCATTCCCA ATTTGTGTGT GTTGGGGGGT ACTTTTAAAG 
4551 GTGACTATTG TTTTGTACAT CTAATTTTGG GAAACCAAGT CTATAAGACA 
4 601 TCTTGTGATT TCTTAATGTT TTTGTTTGTA TGTTTTTCAA AG AT AT C ACT 
4651 GTCCTTTATC ATGTTTTGAA GATTGTTTAA AATTCATTTT CCTAAATTAA 
4701 TGTGCAAGTA ATGTTTTGAG GATATCGGTG TTTTATATTA AACATATTTC 
4751 CAATTCAAAA AAAAAAAAAA AAAAACTTAT CGATACCGTC GACCTCGATG 
4801 ATGATGATGA TGATGATGAT GTCGAC 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 138 bp to 3089 bp? peptide length: 984 
Category: similarity to known protein 



1 MDKRKLGRRP SSSEIITEGK RKKSSSDLSE IRKMLNAKPE DVHVQSPLSK 

51 FRSSERWTLP LQWERSLRNK VISLDHKNKK HIRGCPVTSR SSPERIPRVI 

101 LTNVLGTELG RKYIRTPPVT EGSLSDTDNL QSEQLSSSSD GSLESYQNLN 

151 PHKSCYLSER GSQRSKTVDD NSAKQTAHNK EKRRKDDGIS LLISDTQPED 

201 LNSGSRGCDH LEQESRNKDV KYSDSKVELT LISRKTKRKL RNNLPDSQYC 

251 TSLDKSTEQT KKQEDDSTIS TEFERPSENY HQDPKLPEEI TTKPTKSDFT 

301 KLSSLNSQEL TLSNATKSAS AGSTTETVEY SNSIDIVGIS SLVEKDENEL 

351 NTIEKPILRG HNEGNQSLIS AEPIWSSDE EGPVEHKSSE ILKLQSKQDR 

401 ETTNENESTS ESALLELPLI TCESVQMSSE LCPYNPVMEN ISSIMPSNEM 

451 DLQLDFIFTS VYIGKIKGAS KGCVTITKKY IKIPFQVSLN EISLLVDTTH 
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501 LKRFGLWKSK DDNHSKRSHA ILFFWVSSDY LQEIQTQLEH SVLSQQSKSS 
551 EFIFLELHNP VSQREELKLK DIMTEISIIS GELELSYPLS WVQAFPLFQN 
601 LSSKESSFIH YYCVSTCSFP AGVAVAEEMK LKSVSQPSNT DAAKPTYTFL 
651 QKQSSGCYSL SITSNPDEEW REVRHTGLVQ KLIVYPPPPT KGGLGVTNED 
701 LECLEEGEFL NDVIIDFYLK YLILEKASDE LVERSHIFSS FFYKCLTRKE 
751 NNLTEDNPNL SMAQRRHKRV RTWTRHINIF NKDYIFVPVN ESSHWYLAVI 
801 CFPWLEEAVY EDFPQTVSQQ SQAQQSQSDN KTIDNDLRTT STLSLSAEDS 
851 QSTESNMSVP KKMCKRPCIL ILDSLKAASV RNTVQNLREY LEVEWEVKLK 
901 THRQFSKTNM VDLCPKVPKQ DNSSDCGVYL LQYVESFFKD PIVNFELPIH 
951 LEKWFPRHVI KTKREDIREL ILKLHLQQQK GSSS 

BLASTP hits 

Entry SPAC17A5J7 from database TREMBL: 

"SPAC17A5.07c"; product: "hypothetical protein"; S.pombe 
chromosome I cosmid cl7A5. Schizosaccharomyces pombe (fission 
yeast) 

Length = 652 

Score = 275 (96.8 bits), Expect = 1.9e-29, Sum P(3) = 1.9e-29 
Identities = 56/120 (46%), Positives - 78/120 (65%) 

Entry S49947 from database PIR: 

SMT4 protein - yeast (Saccharomyces cerevisiae) 

Length = 1034 

Score ■ 163 (57.4 bits), Expect = 4.6e-16, Sum P(3) = 4.6e-16 
Identities = 46/159 (28%), Positives = 76/159 (47%) 

Entry YQG6_CAEEL from database SWISSPROT: 
HYPOTHETICAL 35.7 KD PROTEIN C41C4.6 IN CHROMOSOME II. 
Length - 342 

Score = 162 (57.0 bits), Expect - 6.1e-13, Sum P(3) = 6.1e-13 
Identities - 37/119 (31%), Positives = 62/119 (52%) 

Entry AB018340_1 from database TREMBL: 

gene: "KIAA0797"; product: "KIAA0797 protein"; Homo sapiens mRNA for 

KIAA0797 protein, partial cds. 

Score = 540, P = 1.9e-50, identities - 120/243, positives = 155/243 



Alert BLASTP hits for DKFZphfbr2_16gl8, frame 3 

TREMBL: ATT 16L1_11 gene: "T16L1 . 110"; product: "putative protein"; 
Arabidopsis thaliana DNA chromosome 4, BAC clone T16L1 (ESS All 
project), N - 2, Score = 239, P =» 2.1e-18 



>TREMBL : ATT1 6Ll_l 1 gene: "T16L1.110"; product: "putative protein"; 

Arabidopsis thaliana DNA chromosome 4, BAC clone T16L1 (ESSAII project) 
Length - 710 

HSPs: 



Score = 239 (35.9 bits), Expect = 2.1e-18, Sum P(2) - 2.1e-18 
Identities - 51/135 (37%), Positives => 78/135 (57%) 



Query: 


683 


IVYPPPPTKGGLGVTNEDLECLEEGEFLNDVIIDFYLKYLILEKASDELVERSHIFSSFF 


742 






+VYP + V +D+E L+ F+ND IIDFY+KYL + S + R H F+ FF 




Sbjct: 


176 


LVYPQGEPDAVV- VRKQDI ELLKPRRFINDTI I DFYIKYL-KNRI SPKERGRFHFFNCFF 


233 


Query: 


743 


YKCLTRKENNLTEDNPNLSMAQRRHKRVRTWTRHINIFNKDYIFVPVNESSHWYLAVICF 


802 






+ • RK NL + P+ + ++RV+ WT+++++F KDYIF+P+N S HW L +IC 




Sbjct: 


234 


F RKLANLDKGTPSTCGGREAYQRVQKWTKNVDLFEKDYIFIPINCSFHWSLVIICH 


289 


Query: 


803 


PWLEEAVYEDFPQTV 817 








P + + PQ V 




Sbjct: 


290 


PGELVPSHVENPQRV 304 




Score 


» 70 


(10.5 bits), Expect - 2.1e-18, Sum P(2) « 2.1e-18 




Identities - 13/28 (46%), Positives =• 15/28 (53%) 




Query: 


948 


PIHLEKWFPRHVIKTKREDIRELILKLH 975 








P HL WFP KR +1 EL+ LH 




Sbjct: 


403 


PSHLRNWFPAKEASLKRRNILELLYNLH 430 





Pedant information for DKFZphfbr2_16gl8, frame 3 



Report for DKFZphfbr2_16gl8 . 3 
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(LENGTH) 984 

[MW] 112265.80 

tpl] 6.13 

[HOMOLJ TREMBL:AB018340_1 gene: "KIAA0797"; product: "KIAA0797 protein"; Homo sapi 
mRNA for KIAA0797 protein, partial cds . 8e-53 

[FUNCAT] 03.22 cell cycle control and mitosis (S. cerevisiae, YIL031w] 9e-17 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YPL020c) 4e-06 

[BLOCKS] BL00494C Bacterial luciferase subunits proteins 

[PROSITE] AMIDATION 3 

[PROSITE] MYRISTYL 9 

[PROSITEJ CAMP_PHOSPHO_SITE 2 

[PROSITE] CK2_PHOSPHO_SITE 30 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 19 

[PROSITE] ASN_GLYCOSYLATION 12 

[KW] Alpha_Beta 

[KW] LOW COMPLEXITY 4.47 % 



SEQ MDKRKLGRRPSSSEIITEGKRKKSSSDLSEIRKMLNAKPEDVHVQSPLSKFRSSERWTLP 

SEG 

PRD ccccceeecccceeeeecccccccccchhhhhhhhhhccccccccccccccccccccchh 

SEQ LQWERSLRNKVISLDHKNKKHIRGCPVTSRSSPERIPRVILTNVLGTELGRKYIRTPPVT 

SEG 

PRD hhhhhhhhhheeeeccccceeeccccccccccccceeeeeeeeeccceeeccceeecccc 

SEQ EGSLSDTDNLQSEQLSSSSDGSLESYQNLNPHKSCYLSERGSQRSKTVDDNSAKQTAHNK 

SEG xxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccchhhhhhhh 

SEQ EKRRKDDGISLLISDTQPEDLNSGSRGCDHLEQESRNKDVKYSDSKVELTLISRKTKRRL 

SEG 

PRD hhhhcccceeeeecccccccccccccccccccccccccccccccccceeeeeehhhhhhh 

SEQ RNNLPDSQYCTSLDKSTEQTKKQEDDSTISTEFERPSENYHQDPKLPEEITTKPTKSDFT 

SEG 

PRD hccccccccccccccccchhhhhccccccccccccccccccccccccccccccccccccc 

SEQ KLSSLNSQELTLSNATKSASAGSTTETVEYSNSIDIVGISSLVEKDENELNTIEKPILRG 

SEG 

PRD ccccccccceeehhhhhhhcccccceeeeccceeeceeeccchhhhhhhhhhhccccccc 

SEQ HNEGNQSLISAEPIVVSSDEEGPVEHKSSEILKLQSKQDRETTNENESTSESALLELPLI 

SEG xxxxxxxxxxxxxxxxx . . . 

PRD cccccceeeecceeeeecccccccccchhhhhhhhhhhhhhcccccccchhhhhccccce 

SEQ TCESVQMSSELCPYNPVMENISSIMPSNEMDLQLDFIFTSVYIGKIKGASKGCVTITKKY 

SEG 

PRD eecccccccccccccccccceeeccccchhhhhhheeeeeeeeeeeeccccceeeeeeee 

SEQ IKIPFQVSLNEISLLVDTTHLKRFGLWKSKDDNHSKRSHAILFFWVSSDYLQEIQTQLEH 

SEG 

PRD eeeeccccceeeeeeecccceeeeeeeecccccccccceeeeeeeeccchhhhhhhhhhh 

SEQ SVLSQQSKSSEFIFLELHNPVSQREELKLKDIMTEISIISGELELSYPLSWVQAFPLFQN 

SEG 

PRD hhhhccccceeeeeeeeccccccchhhhhhhhhheeeeeccceeeeccceeeeeeceeec 

SEQ LSSKESSFIHYYCVSTCSFPAGVAVAEEMKLKSVSQPSNTDAAKPTYTFLQKQSSGCYSL 

SEG 

PRD ccccccccceeeeecccccccchhhhhhhhhhhcccccccccccccceeeecccccccce 

SEQ SITSNPDEEWREVRHTGLVQKLIVYPPPPTKGGLGVTNEDLECLEEGEFLNDVIIDFYLK 

SEG i 

PRD eeccccccceeeeeeccceeeeeeecccccccccccccchhhhhhhhccchhhhhhhhhh 

SEQ YLILEKASDELVERSHIFSSFFYKCLTRKENNLTEDNPNLSMAQRRHKRVRTWTRHINIF 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhc 

SEQ NKDYIFVPVNESSHWYLAVICFPWLEEAVYEDFPQTVSQQSQAQQSQSDNKTIDNDLRTT 

SEG xxxxxxxxxxx 

PRD cceeeeeccccccceeeeeeeccchhhhhhhccccchhhhhhhhhhcccccccccccccc 

SEQ STLSLSAEDSQSTESNMSVPKKMCKRPCILILDSLKAASVRNTVQNLREYLEVEWEVKLK 

SEG 

PRD cceeeeecccccceeeccccccccccceeeeeccccccccchhhhhhhhhhhhhhhhhhh 

SEQ THRQFSKTNMVDLCPKVPKQDNSSDCGVYLLQYVESFFKDPIVNFELPIHLEKWFPRHVI 
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SEG 

PRD hhhhhccccccccccccccccccccceeeeehhhhhhhcccceeecccccccccccchhh 

SEQ KTKREDIRELILKLHLQQQKGSSS 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccc 



Prosite for DKFZphfbr2_16gl8 . 3 



PS00001 
PS00001 
PS00001 
PSO0O01 
PS00001 
PS00001 
PS00001 
PSO0001 
PSOOOOl 
PS00001 
PSOOOOl 
PSOOOOl 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS0O006 
PS00006 
PS00006 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 



314->318 
365->369 
406->410 
440->444 
513->517 
600->604 
752->756 
759->763 
790->794 
830->834 
856->860 
922->926 
8->12 
21->25 
54->57 
66->69 
88->91 
158->161 
162->165 
172->175 
233->236 
236->239 
260->263 
291->294 
477->480 
515->518 
562->565 
602->605 
747->750 
874->877 
879->882 
901->9O4 
962->965 
11->15 
24->28 
91->95 
123->127 
125->129 
137->141 
167->171 
196->200 
225->229 
251->255 
271->275 
295->299 
323->327 
341->345 
377->381 
396->400 
402->406 
408->412 
488->492 
509->513 
536->540 
562->566 
602->606 
638->642 
664->668 
697->701 
747->751 
826->830 
846->850 
962->966 
216->223 
84->90 
106->112 
141->147 
161->167 
2O4->210 
468->474 



ASN_GLYCOSYLATION 

ASN_GL Y COS YLAT I ON 

ASNJ3LYCOSYLATION 

AS N_G LYCOS YLAT ION 

ASN_GLYCOS YLAT ION 

ASN_GLYCOS YLAT ION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

CAKP_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO SITE 

CK2~PHOSPHO~SITE 

CK2_PH0SPH0 SITE 

CK2_PH0SPH03SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK 2_PHOS PHO_S ITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2"PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOS PH0"S ITE 

CK2_PHOS PHO_S ITE 

CK2_PHOSPHO_SITE 

TYR_PHOS PHO_S ITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC0O001 
PDOC00001 
PDOC00001 
PDOC000O1 
PDOC0O0O1 
PDOC00001 
PDOCOOOOl 
PDOC00001 
PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC000O5 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 



142 



WO 01/12659 



PCT/IB00/01496 



PS00008 
PS00008 
PS00008 
PS00009 
PS0O009 
PS00009 



505->511 
622->628 
693->699 
6->10 
l8->22 
109->113 



MYRISTYL 
MYRISTYL 
MYRISTYL 
AMI DAT ION 
AMI DAT ION 
AMIDATION 



PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC000O9 
PDOC00009 



(No Pfam data available for DKFZphfbr2_16gl8 . 3) 
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DKFZphfbr2_16il2 



group: transmembrane protein 

DKFZphfbr2_16il2 encodes a novel 185 amino acid protein, with strong similarity to PUT 2 
protein of Fugu rubripes. 

The novel protein contains 1 transmembrane region. 

PUT 2 is a Fugu rupies protein similar to the neural cell adhesion molecule LI (Ll-CAM) a 
mitosis-specific chromosome segregation protein (SMC1) and the calcium channel alpha-1 subunit 
homolog (CCAl) . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



strong similarity to Fugu rubripes PUT2 

complete cDNA, complete cds, EST hits, 
TRANSMEMBRANE 1 

Sequenced by LMU 

Locus: /map="873 . 3/875. 1 CR from top of Chrl linkage group" 
Insert length: 1552 bp 

Poly A stretch at pos . 1528, polyadenylation signal at pos. 1506 



1 GGGGGGGGAC AACTGGGTCT TTTGCGGCTG CAGCGGGCTT GTAGGCGTCC 

51 GGCTTTGCTG GCCCAGCAAG CCTGATAAGC ATGAAGCTCT TATCTTTGGT 

101 GGCTGTGGTC GGGTGTTTGC TGGTGCCCCC AGCTGAAGCC AACAAGAGTT 

151 CTGAAGATAT CCGGTGCAAA TGCATCTGTC CACCTTATAG AAACATCAGT 

201 GGGCACATTT ACAACCAGAA TGTATCCCAG AAGGACTGTT GTAGCAACTG 

251 CCTGCACGTG GTGGAGCCCA TGCCAGTGCC TGGCCATGAC GTGGAGGCCT 

301 ACTGCCTGCT GTGCGAGTGC AGGTACGAGG AGCGCAGCAC CACCACCATC 

351 AAGGTCATCA TTGTCATCTA CCTGTCCGTG GTGGGTGCCC TGTTGCTCTA 

401 CATGGCCTTC CTGATGCTGG TGGACCCTCT GATCCGAAAG CCGGATGCAT 

451 ACACTGAGCA ACTGCACAAT GAGGAGGAGA ATGAGGATGC TCGCTCTATG 

501 GCAGCAGCTG CTGCATCCCT CGGGGGACCC CGAGCAAACA CAGTCCTGGA 

551 GCGTGTGGAA GGTGCCCAGC AGCGGTGGAA GCTGCAGGTG CAGGAGCAGC 

601 GGAAGACAGT CTTCGATCGG CACAAGATGC TCAGCTAGAT GGGCTGGTGT 

651 GGTTGGGTCA AGGCCCCAAC ACCATGGCTG CCAGCTTCCA GGCTGGACAA 

701 AGCAGGGGGC TACTTCTCCC TTCCCTCGGT TCCAGTCTTC CCTTTAAAAG 

751 CCTGTGGCAT TTTTCCTCCT TCTCCCTAAC TTTAGAAATG TTGTACTTGG 

801 CTATTTTGAT TAGGGAAGAG GGATGTGGTC TCTGATCTCT GTTGTCTTCT 

851 TGGGTCTTTG GGGTTGAAGG GAGGGGGAAG GCAGGCCAGA AGGGAATGGA 

901 GACATTCGAG GCGGCCTCAG GAGTGGATGC GATCTGTCTC TCCTGGCTCC 

951 ACTCTTGCCG CCTTCCAGCT CTGAGTCTTG GGAATGTTGT TACCCTTGGA 

1001 AGATAAAGCT GGGTCTTCAG GAACTCAGTG TTTGGGAGGA AAGCATGGCC 

1051 CAGCATTCAG CATGTGTTCC TTTCTGCAGT GGTTCTTATC ACCACCTCCC 

1101 TCCCAGCCCC AGCGCCTCAG CCCCAGCCCC AGCTCCAGCC CTGAGGACAG 

1151 CTCTGATGGG AGAGCTGGGC CCCCTGAGCC CACTGGGTCT TCAGGGTGCA 

1201 CTGGAAGCTG GTGTTCGCTG TCCCCTGTGC ACTTCTCGCA CTGGGGCATG 

1251 GAGTGCCCAT GCATACTCTG CTGCCGGTCC CCTCACCTGC ACTTGAGGGG 

1301 TCTGGGCAGT CCCTCCTCTC CCCAGTGTCC ACAGTCACTG AGCCAGACGG 

1351 TCGGTTGGAA CATGAGACTC GAGGCTGAGC GTGGATCTGA ACACCACAGC 

1401 CCCTGTACTT GGGTTGCCTC TTGTCCCTGA ACTTCGTTGT ACCAGTGCAT 

1451 GGAGAGAAAA TTTTGTCCTC TTGTCTTAGA GTTGTGTGTA AATCAAGGAA 

1501 GCCATCATTA AATTGTTTTA TTTCTCTCAA AAAAAAAAAA AAAAAAAATA 

1551 TC 



BLAST Results 



Entry HS808349 from database EMBL: 
human STS WI-11986. 
Score = 1716, P « 5.7e-73, identities = 364/378 

Entry HS4B7355 from database EMBL: 
human STS WI-13088. 
Score = 1358, P = 1.3e-56, identities *» 274/277 



Medline entries 



144 



WO 01/12659 



PCT7IB0O/O1496 



No Medline entry 



Peptide information for frame 3 



ORF from 81 bp to 635 bp; peptide length: 185 
Category: similarity to unknown protein 



1 MKLLSLVAVV GCLLVPPAEA NKSSEDIRCK CICPPYRNIS GHIYNQNVSQ 
51 KDCCSNCLHV VEPMPVPGHD VEAYCLLCEC RYEERSTTTI KVIIVIYLSV 
101 VGALLLYMAF LMLVDPLIRK PDAYTEQLHN EEENEDARSM AAAAASLGGP 
151 RANTVLERVE GAQQRWKLQV QEQRKTVFDR HKMLS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_16il2, frame 3 

TREMBL:AF026198_5 gene: " PUT2" ; product: "putative protein 2"; Fugu 
rubripes neural cell adhesion molecule LI homolog (Ll-CAM) gene, 
complete cds; putative protein 1 (PUT1) gene, partial cds; 
mitosis-specific chromosome segregation protein SMC1 homolog (SMC1) 
gene, complete cds; and calcium channel alpha-1 subunit homolog (CCAl) 
and putative protein 2 (PUT2) genes, partial cds, complete sequence., N 
= 1, Score « 655, P » 2.8e-64 

TREMBL:CER12C12_5 gene: "R12C12.6"; Caenorhabditis elegans cosmid 
R12C12., N - 1, Score ~ 225, P - le-18 



>TREMBL:AF026198_5 gene: "PUT2"; product: "putative protein 2"; Fugu 

rubripes neural cell adhesion molecule LI homolog (Ll-CAM) gene, complete 
cds; putative protein 1 {PUTl) gene, partial cds; mitosis-specific 
chromosome segregation protein SMC1 homolog (SMC1) gene, complete cds; and 
calcium channel alpha-1 subunit homolog {CCAl ) and putative protein 2 
(PUT2) genes, partial cds, complete sequence. 
Length = 187 

HSPs: 



Score = 655 (98.3 bits), Expect = 2.8e-64, P = 2.8e-64 
Identities - 124/163 (76%), Positives « 140/163 (85%) 



Query: 


22 


KSSEDIRCKCICPPYRNISGHIYNQNVSQKDCCSNCLHWEPMPVPGHDVEAYCLLCECR 


81 






KS +D+RCKCICPPYRNISGHIYN+N +QKDC NCLHW+PMPVPG+DVEAYCLLCEC+ 




Sbjct: 


31 


KSFDDVRCKCICPPYRNISGHI YNRNFTQKDC — NCLHWDPMPVPGNDVEAYCLLCECK 


88 


Query: 


82 


YEERSTTTIKVIIVIYLSWGALLLYMAFLMLVDPLIRKPDAYTEQLHNEEENEDARSMA 


141 






YEERST TI+V I +1 +LS VVGALLLYM FL+LVDPLIRKPD + LHNEE++ED + 




Sbjct: 


89 


YEERSTNTIRVTIIIFLSVVGALLLYMLFLLLVDPLIRKPDPLAQTLHNEEDSEDIQPQM 


148 


Query: 


142 


AAAASLGGP-RANTVLERVEGAQQRWKLQVQEQRKTVFDRHKML 184 








+ G P R NTVLERVEGAQQRWK QVQEQRKTVFDRHKML 




Sbjct: 


149 


S GDPARGNTVLERVEGAQQRWKKQVQEQRKTVFDRHKML 187 





Pedant information for DKFZphfbr2_16il2, frame 3 



Report for DKFZphfbr2_16il2 . 3 



[LENGTH] 185 

[MW] 20764.29 

[pi] 6.21 

[HOMOL] TREMBL:AF026198_5 gene: "PUT2"; product: "putative protein 2"; Fugu rubripes 

neural cell adhesion molecule LI homolog (Ll-CAM) gene, complete cds; putative protein 1 
(PUTl) gene, partial cds; mitosis-specific chromosome segregation protein SMC1 homolog (SMC1) 
gene, complete cds; and calcium channel alpha-1 subunit homolog (CCAl) and putative protein 2 
(PUT2) genes, partial cds, complete sequence. 3e-68 
[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 4 

[ PROSITE] PKC_PHOSPHO_SITE 2 

[PROSITE] AS N_G L Y COS Y L AT I ON 3 

[KW] SIGNAL PEPTIDE 21 



145 



WO 01/12659 



[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 2.70 % 



SEQ MKLLSLVAVVGCLLVPPAEANKSSEDIRCKCICPPYRNISGHI YNQNVSQKDCCSNCLHV 

SEG 

PRD ccceeeeeeeeccccccccccccccceeeeeecccccccccceeeccccccccccceeee 

MEM 

SEQ VEPMPVPGHDVEAYCLLCECRYEERSTTTIKVIIVIYLSWGALLLYMAFLMLVDPLIRK 

SEG 

PRD . eecccccccccchhhhhhhhhhhhccccceeeeeeehhhhhhhhhhhhhhhhhhhccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM . . . 

SEQ PDAYTEQLHNEEENEDARSMAAAAASLGGPRANTVLERVEGAQQRWKLQVQEQRKTVFDR 

SEG xxxxx 

PRD ccchhhhhhhhhcccchhhhhhhhhhccccccchhhhhhhchhhhhhhhhhhhhhhhhhh 

MEM 

SEQ HKMLS 

SEG 

PRD hhccc 

MEM 



Prosite for DKFZphfbr2_16il2 . 3 



PS00001 


21->25 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


38->42 


asn" 


"glycosylation 


PDOC00001 


PS00001 


47->51 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00005 


49->52 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


89->92 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00006 


23->27 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


49->53 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


154->158 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


176->180 


CK2" 


"PHOSPHORITE 


PDOC00006 


PS00008 


148->154 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphfbc2_16il2.3) 
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WO 01/12659 

DKFZphfbr2_16Jc22 



PCT/IB00/01496 



group: brain derived 

DKFZphfbr2_16k22 encodes a novel 108 amino acid protein with very weak similarity to 
thioredoxin of Bacillus subtilis. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



weak similarity to thioredoxin 

complete cDNA, complete cds, genomic DNA? 
no EST hits 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 2088 bp 

Poly A stretch at pos . 2065, no polyadenylation signal found 



1 AAAAGGAAGA AGGAAATAAG GATATTTCAA GGGTTACCAA AGTCGAGGAA 

51 AACTATTTTA AGAAGAAATC TGAATTATTT GTGCACATAG GTTGTAATAA 

101 TAGCATCTTG CATTAAATGG TGTTTTCTAG CTTACAAAGT GGATTCATAT 

151 ACACTATTGT AACTGACTCT CTACAAACTT GCAAGGTTAG CAAGACAAAT 

201 GGTATTTTAA GATAACAAAC TGAGACTCAA AAAAGGCAAG TAACTCGTTC 

251 TACTTCCCAA AGCCAGAAAG TGGCAAAATA GAAAATGGAT CCTGAATCTC 

301 CAACACCATG CAAACTAAGA GAGGGAATCC TCTGTAGAGG GAATGGAAGT 

351 AAAAAGGCAC AAGTGGTGAT GTCACCTTCT GAACAGAGAT GGAACTTTTC 

401 TTCCTCTGAG AAAAAAGAGA AAAGATAGTT TTAAGTGGCA AAAGAACATG 

451 AAGCAATGTG AGGTGAAGAA ACAGAAAAGA CTATGGATGG AATTCCTAGA 

501 TGTGAGATAC ACAAAGTTCC ATTTCAAAGA GAAATATCTA TAGATAGGCA 

551 TAAAGTTACA CACCTGAACT ACCAACTCTG AACCAGTAAC TCAAGAGATA 

601 TTTTGTGTGT CCCACAAGCC ATATGGCTCT GGGGACAAAT TATCTGAAAG 

651 TGCCCAATAA GAAAAATATT TGAGGAAGGG GAGTTGGTGA GTGAATGAAT 

701 TAAAGGACAT CAGAAAGATA CATTGACTGT TCTCCTTCCC AGGAAACAAA 

751 GTGGCTAAGT CAAAACAACG GGCAGCTGTG GGATAGCAAA GAAAAAAAAA 

801 CTTCCAGGCC CAGGTTCTAG TGAAAGCTAC TATGGAAGTT AGCCACTCAA 

851 CTTTAGAACC AGAGGCTTCT TTTCCTCCTC CCTTCTTATC TTTTCTAGTT 

901 TATAGCAAAT TTATATTGAG CCACTTATTC TTTCTGAATG CTAGTTCCCC 

951 TTTAGCATTT CTTTTTCTTC ATTCCCTTTG GACTGGCCCA ATGCTTTGGC 

1001 CCCTTATCAA AGCATTTTCT AAGAAACAGT CTGACAGCTC TAATTTGCAT 

1051 CTGGTTATGC AAGATGTGGT TAAGAACATG GACTCTGGAG GTAAATACAC 

1101 CTTGATTCCA ATTCATTCTC TCATTTATTC ATTCAGCAAA TATTTAGTGA 

1151 ACATCTAACA TGTGCTAGGC ACTGTTCTAG TTGCTGAGGA TACAGCTTCA 

1201 AACAAAATAA GGTCTCTGCA AGGATGCCTT CTCTTACCAC TCCTATTCAG 

1251 CGTAGTATTG GAAGTCCTGG CCAGGGCAAT CAGGCAAGAA AAAGAAATCA 

1301 AGGTCATCCA AATAGGAAGA GAGGAAGTCA AACTATCCCT GTTTACAGAC 

1351 AACATGATCC TACATCTAGA AAAAAACCCA TTGTCTTAGC CCAAAAGCTT 

1401 CTTAGGCTGA TAAACAACTT CAGCAAAGTC TTAGGATACA AAATCCATGT 

1451 GCAAAAAACA CTAGCATTCT TATACACCAA CAACAGTCAA GCCGAGATCC 

1501 AAATCAGGAA CAAACTCCTA TTCACAATTG CCACAAAAAC AATAGAACAG 

1551 GAAAACAGCT AACTAGGAAG GTGAAAGATC TCTACAAGGA GAACTACAAA 

1601 CCACTGCTCA CAGAAATCAG AG AT G AC AC A TATAAATGGA AAAACATTCC 

1651 ATGATCATGG ATAGGAAGAA TGAATATTAC TGAAATGGCT ATACTGTCCA 

1701 AAGCAATTTA TAGATTCAAT GCTATTCCTA GTAAACTACC ATTGAGATTT 

1751 TTTACAGAAC TAGAAAAAAA AAAAACTATT TTAAGGCTGG GCGCAGTGGC 

1801 TCTCACCTGT AATCCCAGCA CTTTGGGAGG CCGAGATGGG TGGATCACGA 

1851 GGTCAGGAGA TGGAAAACAT CCTGGCTAAC ATGGTGAAAC CCCGTCTCTA 

1901 CTAAAAATAC AAAAAATTAG CCAGGCGTGG TGGTGGGCGC CTGTAATCCC 

1951 AGCTGCTCGG GAGGCTGAGG CAGGATAATG GTGTGAACCC GGGAGGCAGA 

2001 GCTTGCAGTG AGCTGAGATT GCACCACTGC ACTCCAGCCT GAGGGACAGA 
2051 GTGAGACTCC ATCTCAAAAA AAAAAAAAAA AAAAAAAA 



BLAST Results 



NO BLAST result 



Medline entries 



147 



WO 01/12659 



PCT/IB00/01496 



No Medline entry 



Peptide information for frame 1 



ORF from 832 bp to 1155 bp; peptide length: 108 
Category: putative protein 



1 MEVSHSTLEP EASFPPPFLS FLVYSKFILS HLFFLNASSP LAFLFLHSLW 
51 TGPMLWPLIK AFSKKQSDSS NLHLVMQDW KNMDSGGKYT LIPIHSLIYS 
101 FSKYLVNI 



BLAST P hits 
Entry B37192 from database PIR: 

thioredoxin - Bacillus subtilis Score = "71 (25.0 bits), Expect 
P * 0.039 

Identities « 16/49 (32%) , Positives = 30/49 (61%) 



0.040, 



Alert BLASTP hits for DKFZphfbr2_16k22, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_16k22, frame 1 

Report for DKFZphfbr2_16k22 . 1 



[LENGTH] 

[MW] 

[pi] 

[PROSITEJ 
[PROSITEJ 
[PROSITE) 
[PROSITE] 
[PROSITE] 
[KW] 



108 

12281.47 
8.06 

MYRISTYL 1 

CAMP_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

PKC PHOSPHO SITE 

ASN~GLYCOSYLATION 

Alpha_Beta 



SEQ 
PRD 



MEVSHSTLEPEASFPPPFLSFLVYSKFILSHLFFLNASSPLAFLFLHSLWTGPMLWPLIK 
ccccccccccccccccccchhhhhhhhhhhhhhhhccccchhhhhhhhccccccchhhhh 



SEQ 
PRD 



AFSKKQSDSSNLHLVMQDVVKNMDSGGKYTLI PI HSL I YS FSKYLVNI 
hhhcccccccceeehhhhhhcccccccceeeeeccceeeecccccccc 



Prosite for DKFZphfbr2_16k22 . 1 

PS00001 36->40 ASN_GLYCOSYLATION PDOC00001 

PS00004 64->68 CAMP_PHOSPHO_SITE PDOC00004 

PS00005 63->66 PKC PHOSPHO SITE PDOC00005 

PS00006 6->10 CK2~PHOSPHO~SITE PDOC00006 

PS00008 86->92 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphfbr2_16k22 . 1) 
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WO 01/12659 



PCT7IB00/01496 



DKFZphfbr2_16112 



group: transmembrane protein 

DKFZphfbr2_16112 encodes a novel 267 amino acid protein with similarity to gallus gallus 
putative transmembrane protein E3-16 

The novel protein contains one putative transmembrane domain. In chicken, E3-16 is expressed 
specifically in the inner ear. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neurons involved in perception of hearing. 

similarity to gallus putative transmembrane protein E3-16 
complete cDNA, complete cds, EST hits 

potental start at Bp 73 matchs kozak consensus PyCCataG 
TRANSMEMBRANE 1 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 2042 bp 

Poly A stretch at pos . 2024, polyadenylation signal at pos. 2003 

1 GGGGGCGGCG GAGGCAGAGA CCGAGGCTGC ACCGGCAGAG GCTGCGGGGC 
51 GGACGCGCGG GCCGGCGCAG CCATGGTGAA GATTAGCTTC CAGCCCGCCG 

101 TGGCTGGCAT CAAGGGCGAC AAGGCTGACA AGGCGTCGGC GTCGGCCCCT 

151 GCGCCGGCCT CGGCCACCGA GATCCTGCTG ACGCCGGCTA GGGAGGAGCA 

201 GCCCCCACAA CATCGATCCA AGAGGGGGGG CTCAGTGGGC GGCGTGTGCT 

251 ACCTGTCGAT GGGCATGGTC GTGCTGCTCA TGGGCCTCGT GTTCGCCTCT 

301 GTCTACATCT AC AG AT ACT T CTTCCTTGCG CAGCTGGCCC GAGATAACTT 

351 CTTCCGCTGT GGTGTGCTGT ATGAGGACTC CCTGTCCTCC CAGGTCCGGA 

401 CTCAGATGGA GCTGGAAGAG GATGTGAAAA TCTACCTCGA CGAGAACTAC 

4 51 GAGCGCATCA ACGTGCCTGT GCCCCAGTTT GGCGGCGGTG ACCCTGCAGA 

501 CATCATCCAT GACTTCCAGC GGGGTCTGAC TGCGTACCAT GATATCTCCC 

551 TGGACAAGTG CTATGTCATC GAACTCAACA CCACCATTGT GCTGCCCCCT 

601 CGCAACTTCT GGGAGCTCCT CATGAACGTG AAGAGGGGGA CCTACCTGCC 

651 GCAGACGTAC ATCATCCAGG AGGAGATGGT GGTCACGGAG CATGTCAGTG 

701 ACAAGGAGGC CCTGGGGTCC TTCATCTACC ACCTGTGCAA CGGGAAAGAC 

751 ACCTACCGGC TCCGGCGCCG GGCAACGCGG AGGCGGATCA ACAAGCGTGG 

801 GGCCAAGAAC TGCAATGCCA TCCGCCACTT CGAGAACACC TTCGTGGTGG 

651 AGACGCTCAT CTGCGGGGTG GTGTGAGGCC CTCCTCCCCC AGAACCCCCT 

901 GCCGTGTTCC TCTTTTCTTC TTTCCGGCTG CTCTCTGGCC CTCCTCCTTC 

951 CCCCTGCTTA GCTTGTACTT TGGACGCGTT TCTATAGAGG TGACATGTCT 
1001 CTCCATTCCT CTCCAACCCT GCCCACCTCC CTGTACCAGA GCTGTGATCT 
1051 CTCGGTGGGG GGCCCATCTC TGCTGACCTG GGTGTGGCGG AGGGAGAGGC 
1101 GATGCTGCAA AGTGTTTTCT GTGTCCCACT GTCTTGAAGC TGGGCCTGCC 
1151 AAAGCCTGGG CCCACAGCTG CACCGGCAGC CCAAGGGGAA GGACCGGTTG 
1201 GGGGAGCCGG GCATGTGAGG CCCTGGGCAA GGGGATGGGG CTGTGGGGGC 
1251 GGGGCGGCAT GGGCTTCAGA AGTATCTGCA CAATTAGAAA AGTCCTCAGA 
1301 AGCTTTTTCT TGGAGGGTAC ACTTTCTTCA CTGTCCCTAT TCCTAGACCT 
1351 GGGGCTTGAG CTGAGGATGG GACGATGTGC CCAGGGAGGG ACCCACCAGA 
1401 GCACAAGAGA AGGTGGCTAC CTGGGGGTGT CCCAGGGACT CTGTCAGTGC 
14 51 CTTCAGCCCA CCAGCAGGAG CTTGGAGTTT GGGGAGTGGG GATGAGTCCG 
1501 TCAAGCACAA CTGTTCTCTG AGTGGAACCA AAGAAGCAAG GAGCTAGGAC 
1551 CCCCAGTCCT GCCCCCCAGG AGCACAAGCA GGGTCCCCTC AGTCAAGGCA 
1601 GTGGGATGGG CGGCTGAGGA ACGGGGCAGG CAAGGTCACT GCTCAGTCAC 
1651 GTCCACGGGG GACGAGCCGT GGGTTCTGCT GAGTAGGTGG AGCTCATTGC 
1701 TTTCTCCAAG CTTGGAACTG TTTTGAAAGA TAACACAGAG GGAAAGGGAG 
1751 AGCCACCTGG TACTTGTCCA CCCTGCCTCC TCTGTTCTGA AATTCCATCC 
1801 CCCTCAGCTT AGGGGAATGC ACCTTTTTCC CTTTCCTTCT CACTTTTGCA 
1851 TGTTTTTACT GATCATTCGA TATGCTAACC GTTCTCAGCC CTGAGCCTTG 
1901 GAGAGGAGGG CTGTAACGCC TTCAGTCAGT CTCTGGGGAT GAAACTCTTA 
1951 AATGCTTTGT ATATTTTCTC AATTAGATCT CTTTTCAGAA GTGTCTATAG 
2001 AACAATAAAA ATCTTTTACT TCTGAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



149 



WO 01/12659 



PCT7IB00/01496 



Medline entries 



96325063: 

Isolation of markers for chondro-osteogenic differentiation 
using cDNA library subtraction. Molecular cloning and 
characterization of a gene belonging to a novel multigene 
family of integral membrane proteins. 



Peptide information for frame 1 



ORF from 73 bp to 873 bp; peptide length: 267 
Category: similarity to known protein 



1 MVKISFQPAV AGIKGDKADK ASASAPAPAS ATEILLTPAR EEQPPQHRSK 
51 RGGSVGGVCY LSMGMWLLM GLVFASVYIY RYFFLAQLAR DNFFRCGVLY 
101 EDSLSSQVRT QMELEEDVKI YLDENYERIN VPVPQFGGGD PADIIHDFQR 
151 GLTAYHDISL DKCYVIELNT TIVLPPRNFW ELLMNVKRGT YLPQTYIIQE 
201 EMVVTEHVSD KEALGSFIYH LCNGKDTYRL RRRATRRRIN KRGAKNCNAI 
251 RHFENTFVVE TLICGW 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_16112, frame 1 

SWISSNEW: ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-16)., N = 1, Score = 573, P « 1.4e-55 

SWISSNEW:ITMB_MOUSE INTEGRAL MEMBRANE PROTEIN 2B (E25B PROTEIN)., N - 
1, Score = 559, P = 4.2e-54 

SWISSNEW: ITMA_HUMAN INTEGRAL MEMBRANE PROTEIN 2A (E25 PROTEIN) . , N = 1, 
Score » 452, P = 9.1e-43 

>SWISSNEW:ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-16) . 

Length -262 

HSPs: 

Score - 573 (86.0 bits), Expect - 1.4e-55, P - 1.4e-55 
Identities = 118/264 (44%), Positives = 175/264 (66%) 



MVK+SF A+A + A+K ++ ++L+ P + + P+ G C+ 

MVKVSFNSALA--HKEAANKEEENS QVLILPP-DAKEPEDVVVPAGHKRAWCW 50 

-LSMGMVVLLMGLVFASVYIYRYFFLAQLARDNFFRCGVLY-EDSLS SQVRTQM- 112 

+ G+ +L G++ Y+Y+YF Q + CG+ Y ED LS +Q+++ 



+E++++I +E+ E I+VPVP+F DPADI+HDF R LTAY D+SLDKCYVI LNT+ 
HTIEQNIQILEEEDVEFISVPVPEFADSDPADIVHDFHRRLTAYLDLSLDKCYVIPLNTS 

IVLPPRNFWELLMNVKRGTYLPQTYIIQEEMWTEHVSDKEALGSFIYHLCNGKDTYRLR 
+V+PP+NF ELL+N+K GTYLPQ+Y+I E+M+VT+ + + + LG FIY LC GK+TY+L+ 
VVMPPKNFLELLINIKAGTYLPQSYLIHEQMIVTDRIENVDQLGFFIYRLCRGKETYKLQ 

RRATRRRINKRGAKNCNAI RHFENTFVVETLIC 264 
R+ + I KR A NC IRHFEN F +ETLIC 
RKEAMKGIQKREAVNCRKI RHFENRFAMETLIC 260 

Pedant information for DKFZphfbr2_16112, frame 1 

Report for DKFZphfbr2_16112 . 1 



[LENGTH] 267 

[MW) 30223.94 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


51 


Query: 


113 


Sbjct: 


108 


Query: 


172 


Sbjct: 


168 


Query: 


232 


Sbjct: 


228 
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WO 01/12659 



PCT/IB00/01496 



[pi] 

[HOMOL] 

le-49 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE J 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



B.16 

SWISSNEW:ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN E3-16) , 

PRENYLATION 1 
MYRISTYL 5 
CAMP_PHOSPHO_SITE 2 
CK2 PHOSPHO_SITE 3 
TYR~PHOSPHO_SITE 1 

pkc phospho_site 4 
asn'glycosylation 1 
transmembrane 1 

LOW COMPLEXITY 15.36 % 



SEQ MVKISFQPAVAGIKGDKADKASASAPAPASATEILLTPAREEQPPQHRSKRGGSVGGVCY 

SEG xxxxxxxxxxxxxxxx 

PRD ccccccccchhhhhhhhhhhhhhhhhccccccceeecccccccccccccccccccccchh 

MEM MMMMMMMMM 



SEQ LSMGMVVLLMGLVFASVYIYRYFFLAQLARDNFFRCGVLYEDSLSSQVRTQMELEEDVKI 

SEG . .xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhcchhhhhhhhhhccceeeeeecccccccchhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMM 



SEQ YLDENYERINVPVPQFGGGDPADIIHDFQRGLTAYHDISLDKCYVIELNTTIVLPPRNFW 

SEG 

PRD hhcccceeeeccccccccccccchhhhhhhhhhhhhhhcccceeeeeccceeecccchhh 

MEM 



SEQ ELLMNVKRGTYLPQTYIIQEEMVVTEHVSDKEALGSFIYHLCNGKDTYRLRRRATRRRIN 

SEG . xxxxxxxxxxxx 

PRD hhhhhhcccccccceeeeehhhhhhhccccchhhhhheeeccccchhhhhhhhhhhhhhh 

MEM 



SEQ KRGAKNCN A I RH FENT FVVETL I CG VV 

SEG XX 

PRD hhhhccceeeecccchhhhhheeeccc 

MEM 



Prosite for DKFZphfbr2_16112 . 1 



PS00001 


169->173 


PS00004 


187->191 


PS00004 


232->236 


PS00005 


49->52 


PS00005 


209->212 


PS00005 


227->230 


PS00005 


235->238 


PS00006 


30->34 


PS00006 


110->114 


PS00006 


209->213 


PS00007 


119->127 


PS00008 


52->58 


PS00008 


53->59 


PS00008 


71->77 


PS00008 


138->144 


PS00008 


243->249 


PS00294 


264->268 



ASN GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

PRENYLATION 



PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00266 



(No Pfam data available for DKFZphfbr2_16112 . 1 ) 
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DKFZphfbr2_22f21 
group: brain derived 

DKF2phfbr2_22f21 encodes a novel 567 amino acid protein with weak similarity to C. elegans 
cosmide C18C4.5 

No informative BLAST results; no predictive prosite, pfam or SCOP motife 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

weak similarity to C. elegans C18C4.5 

EST HSAA6531/HSAA5273/ defines splice variant, or unspliced cDNA additional -180 Bp at 
position 250 

Sequenced by AGOWA 

Locus: /map""311.4 cR from top of Chrl4 linkage group" 
Insert length: 1910 bp 

Poly A stretch at pos. 1887, polyadenylation signal at pos. 1867 

1 TGGGCCCTTA GCAACGGCCT GGCGACGGTT TCCTTGCTGC TGCAGCCCCC 

51 GTCGGCTCCT CTTTTCCAGT CCTCCACTGC CGGGGCTGGG CCCGGCCGCG 

101 GGAAGGACCG AAGGGGATAC AGCGTGTCCC TGCGGCGGCT GCAAGAGGAC 

151 TAAGCATGGA TGGCAGCCGG AGAGTCAGAG CAACCTCTGT CCTTCCCAGA 

201 TATGGTCCAC CGTGCCTATT TAAAGGACAC TTGAGCACCA AAAGTAATGC 

251 TGCAGTAGAC TGCTCGGTTC CAGTAAGCAT GAGTACCAGC ATAAAGTATG 

301 CAGACCAACA ACGAAGAGAG AAACTCAAAA AGGAATTAGC ACAATGTGAA 

351 AAAGAGTTCA AATTAACTAA AACTGCAATG CGAGCCAATT ATAAAAATAA 

401 TTCCAAGTCA CTTTTTAATA CCTTACAAGA GCCCTCAGGC GAACCGCAAA 

451 TTGAGGATGA CATGTTAAAA GAAGAAATGA ATGGATTTTC ATCCTTTGCA 

501 AGGTCACTAG TACCCTCTTC AGAGAGACTA CACCTAAGTC TACATAAATC 

551 CAGTAAAGTC ATCACAAATG GTCCTGAGAA GAACTCCAGT TCCTCCCCGT 

601 CCAGTGTGGA TTATGCAGCC TCCGGGCCCC GGAAACTGAG CTCTGGAGCC 

651 CTGTATGGCA GAAGGCCCAG AAGCACATTC CCAAATTCCC ACCGGTTTCA 

701 GTTAGTCATT TCGAAAGCAC CCAGTGGGGA TCTTTTGGAT AAACATTCTG 

751 AACTCTTTTC TAACAAACAA TTGCCATTCA CTCCTCGCAC TTTAAAAACA 

801 GAAGCAAAAT CTTTCCTGTC ACAGTATCGC TATTATACAC CTGCCAAAAG 

851 AAAAAAGGAT TTTACAGATC AACGGATAGA AGCTGAAACC CAGACTGAAT 

901 TAAGCTTTAA ATCTGAGTTG GGGACAGCTG AGACTAAAAA CATGACAGAT 

951 TCAGAAATGA ACATAAAGCA GGCATCTAAT TGTGTGACAT ATGATGCCAA 

1001 AGAAAAAATA GCTCCTTTAC CTTTAGAAGG GCATGACTCA ACATGGGATG 

1051 AGATTAAGGA TGATGCTCTT CAGCATTCCT CACCAAGGGC AATGTGTCAG 

1101 TATTCCCTGA AGCCCCCTTC AACTCGTAAA ATCTACTCTG ATGAAGAAGA 

1151 ACTGTTGTAT CTGAGTTTCA TTGAAGATGT AACAGATGAA ATTTTGAAAC 

1201 TTGGTTTATT TTCAAACAGG TTTTTAGAAC GACTGTTCGA GCGACATATA 

1251 AAACAAAATA AACATTTGGA GGGGGAAAAA ATGCGCCACC TGCTGCATGT 

1301 CCTGAAAGTA GACTTAGGCT GCACATCGGA GGAAAACTCG GTAAAGCAAA 

1351 ATGATGTTGA TATGTTGAAT GTATTTGATT TTGAAAAGGC TGGGAATTCA 

1401 GAACCAAATA AATTAAAAAA TGAAAGTGAA GTAACAATTC AGCAGGAACG 

1451 TCAACAATAC CAAAAGGCTT TGGATATGTT ATTGTCGGCA CCAAAGGATG 

1501 AGAACGAGAT ATTCCCTTCA CCAACTGAAT TTTTCATGCC TATTTATAAA 

1551 TCAAAGCATT CAGAAGGGGT TATAATTCAA CAGGTGAATG ATGAAACAAA 

1601 TCTTGAAACT TCAACTTTGG ATGAAAATCA TCCAAGTATT TCAGACAGTT 

1651 TAACAGATCG GGAAACTTCT GTGAATGTCA TTGAAGGTGA TAGTGACCCT 

1701 GAAAAGGTTG AGATTTCAAA TGGATTATGT GGTCTTAACA CATCACCCTC 

1751 CCAATCTGTT CAGTTCTCCA GTGTCAAAGG CGACAATAAT CATGACATGG 

1801 AGTTATCAAC TCTTAAAATC ATGGAAATGA GCATTGAGGA CTGCCCTTTG 

1851 GATGTTTAAT CTTCATTAAT AAATACCTCA AATGGCCAGT AAAAAAAAAA 

1901 AAAAAAAAAA 



BLAST Results 



Entry HS477360 from database EMBL: 
human STS WI-14643. 
Length » 418 
Minus Strand HSPs: 

Score - 1850 (277.6 bits), Expect <= 2.5e-77, P = 2.5e-77 

Identities = 392/405 (96%), Positives = 392/405 (96%), Strand « Minus / 

Plus 
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Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 156 bp to 1856 bp; peptide length: 567 
Category: similarity to unknown protein 



1 MDGSRRVRAT 

51 QQRREKLKKE 

101 DDMLKEEMNG 

151 VDYAASGPRK 

201 FSNKQLPFTP 

251 FKSELGTAET 

301 KDDALQHSSP 

351 LFSNRFLERL 

401 VDMLNVFDFE 

451 EIFPSPTEFF 

501 DRETSVNVIE 

551 STLKIMEMSI 



SVLPRYGPPC 
LAQCEKEFKL 
FSSFARSLVP 
LSSGALYGRR 
RTLKTEAKSF 
KNMTDSEMNI 
RAMCQYSLKP 
FERHIKQNKH 
KAGNSEPNKL 
MPIYKSKHSE 
GDSDPEKVEI 
EDCPLDV 



LFKGHLSTKS 
TKTAMRANYK 
SSERLHLSLH 
PRSTFPNSHR 
LSQYRYYTPA 
KQASNCVTYD 
PSTRKIYSDE 
LEGEKMRHLL 
KNESEVTIQQ 
GVIIQQVNDE 
SNGLCGLNTS 



NAAVDCSVPV 
NNSKSLFNTL 
KSSKVITNGP 
FQLVISKAPS 
KRKKDFTDQR 
AKEKIAPLPL 
EELLYLSFIE 
HVLKVDLGCT 
ERQQYQKALD 
TNLETSTLDE 
PSQSVQFSSV 



SMSTSIKYAD 
QEPSGEPQIE 
EKNSSSSPSS 
GDLLDKHSEL 
IEAETQTELS 
EGHDSTWDEI 
DVTDEILKLG 
SEENSVKQND 
MLLSAPKDEN 
NHPSISDSLT 
KGDNNHDMEL 



BLASTP hits 



Entry CEC18C4_3 from database TREMBL: 
"C18C4.5"; Caenorhabditis elegans cosmid C18C4. 
Length = 1091 

Score = 98 (34.5 bits), Expect = 0.29, P « 0.25 
Identities = 105/470 (22%), Positives = 192/470 (40%) 



Alert BLASTP hits for DKFZphfbr2_22f 21, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_22f21, frame 3 



Report for DKFZphfbr2_22f21 .3 



[LENGTH] 


567 




[MW] 


64120.02 




[pi] 


5.68 




[PROSITE] 


AMI DAT I ON 1 




[PROSITE] 


MYRISTYL 3 




[PROSITE] 


CAMP PHOSPHO SITE 


1 


[PROSITE] 


CK2 PHOSPHO SITE 


16 


[PROSITE] 


PKC PHOSPHO SITE 


18 


[PROSITE] 


ASN GLYCOSYLATION 


4 


[KWJ 


All Alpha 




[KW] 


LOW COMPLEXITY 


1.23 % 



SEQ MDGSRRVRATSVLPRYGPPCLFKGHLSTKSNAAVDCSVPVSMSTSIKYADQQRREKLKKE 

SEG 

PRD cccccceeeeeeccccccccccccccccccceeeecccccccchhhhhhhhhhhhhhhhh 

SEQ LAQCEKEFKLTKTAMRANYKNNSKSLFNTLQEPSGEPQIEDDMLKEEMNGFSSFARSLVP 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccccceeecccccccchhhhhhhhhhhccccccceeecc 

SEQ SSERLHLSLHKSSKVITNGPEKNSSSSPSSVDYAASGPRKLSSGALYGRRPRSTFPNSHR 

SEG xxxxxxx 

PRD ccchhhhhhhhceeeecccccccccccccccccccccccccccccccccccccccccccc 

SEQ FQLVISKAPSGDLLDKHSELFSNKQLPFTPRTLKTEAKSFLSQYRYYTPAKRKKDFTDQR 

SEG 

PRD cceeeeeccccccccccccccccccccccccchhhhhhhhhhhhhccccccchhhhhhhh 

SEQ IEAETQTELSFKSELGTAETKNMTDSEMNIKQASNCVTYDAKEKIAPLPLEGHDSTWDEI 

SEG 

PRD hhhhhhhhhhhhhhccccccccccchhhhhhhccceeehhhhhhcccccccccccccccc 
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SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



KDDALQHSSPRAMCQYSLKPPSTRKIYSDEEELLYLSFIEDVTDEILKLGLFSNRFLERL 

cccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhccchhhhhhh 

FERHIKQNKHLEGEKMRHLLHVLKVDLGCTSEENSVKQNDVDMLNVFDFEKAGNSEPNKL 

hhhhhhhhhhcccchhhhhhhhhccccccccccccccccccccceeeecccccccccccc 

KNESEVTIQQERQQYQKALDMLLSAPKDENEIFPSPTEFFMPIYKSKHSEGVIIQQVNDE 

hhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccccceeeeecccc 

TNLETSTLDENHPSISDSLTDRETSVNVIEGDSDPEKVEISNGLCGLNTSPSQSVQFSSV ' 

ccccccccccccccccccccccccceeecccccccceeeeccccccccccccceeeeecc 

KGDNNHDMELSTLKIMEMSIEDCPLDV 

ccccccchhhhhhhhhhhhhccccccc 
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PS00001 


81->85 


A SN_GL YC 0 S Y L AT I ON 


PDOC00001 


PS00001 


143- 


->147 


ASN~GLYCOS YLATI ON 


PDOC00001 


PS00001 


262- 


■>266 


ASNJ3LYCOSYLATION 


PDOC00001 


PS000O1 


422- 


->426 


ASN~GLYCOSYLATION 


PDOC00001 


PS00004 


159- 


■>163 


CAMP_PHOSPHO_SITE 


PDOC00004 


PS00005 




4->7 


PKC_PHOS PHO_S ITE 


PDOC00005 


PS00005 


27->30 


PKC_PHOS PHO_S ITE 


PDOC00005 


PS00005 


45->48 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


122->125 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


132- 


■>135 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


178- 


■>181 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


202- 


■>205 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


209- 


■>212 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


212- 


->215 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


2S0->253 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


309- 


->312 


PKC PHOSPHO SITE 


PDOC00005 


PS0O005 


317- 


■>320 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


322- 


■>325 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


353- 


■>356 


PKC PH0SPH0"SITE 


PDOC00005 


PS0OO05 


395- 


■>398 


PKC PHOSPHO SITE 


PDOC00005 


PSO0OQ5 


500- 


■>503 


PKC~PHOSPHO~SITE 


PDOC00005 


PS00005 


539- 


■>542 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


552- 


>555 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


89->93 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


149- 


■>153 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


245- 


■>249 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


264- 


•>268 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


295- 


>299 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


328- 


>332 


CK2 PHOSPHO SITE 


PDOC00006 


PSO0006 


337- 


■>341 


CK2 PHOSPHO SITE 


PDOC00006 


PSO0006 


390- 


>394 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


455- 


>459 


CK2 PHOSPHO SITE 


PDOC00006 


PSO0006 


481- 


>485 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


486- 


■>490 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


494- 


>498 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


498- 


>502 


CK2_PHOSPHO SITE 


PDOC00006 


PS00006 


500- 


>504 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


513- 


^517 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


559- 


>563 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00008 


164- 


>170 


MYRISTYL 


PDOC00008 


PS000O8 


256- 


>262 


MYRISTYL 


PDOC00008 


PS00008 


350- 


>356 


MYRISTYL 


PDOC00008 


PS00009 


167- 


>171 


AMIDATION 


PDOC00009 
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DKFZphfbr2_22hl3 



group: transmembrane protein 

DKFZphfbr2_22hl3 encodes a novel 520 amino acid protein, with similarity to Drosophila 
melanogaster EG:39E1.3. 

The protein contains an ATP/GTP A Prosite pattern (P-loop) . This loop interacts with one of 
the phosphate groups of a A or G nucleotide. It is found in numerous ATP- or GTP-binding 
proteins, such as ATP synthase alpha and beta subunits, Myosin heavy chains, Kinesin heavy 
chains and kinesin-like proteins, Dynamins and dynamin-like proteins, several kinases, ONA and 
RNA helicases, GTP-binding elongation factors and the Ras family of GTP-binding proteins. 
Additionally, the novel protein contains one putative transmembran domain. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



AC0047B0_1, differences to predicted genmodel 
membrane regions: 1 

AC004780_1, differences to predicted genmodel 

complete cDNA, complete cds, EST hits 
on genomic level encoded by AC004780, 
differences to predicted genmodel ! 
TRANSMEMBRANE 1 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2292 bp 

Poly A stretch at pos. 2272, polyadenylation signal at pos . 2255 



1 GGGGGAGGGA ACTGATCTCA GCTCGGGCCC GCGTTACATC CTCCTCCTCT 

51 TCTTCCTTCG GCCCAGCTTT CCTTAGGGGC TGCAACCCGG ACGCCGAGGC 

101 CGGTTTCGGA GTGGGGAGTG CCCATTTTCT CTCCTTCCCA CGTTCCTGGC 

151 CCCCAGACGC CATTTGCAGG CGGGTGGCTT GGGTCAGCCT CCCCGCCCCC 

201 ACCCGACTCC CGTCACGGGA GAGCGCACAC CGCGCCCCGA GAACCAATCA 

251 GCAGCCGCGT TAGGTAACCA TGTCTGAGTC TGGACACAGT CAGCCTGGAC 

301 TCTATGGGAT AGAGCGGCGG CGACGGTGGA AGGAGCCTGG CTCTGGTGGC 

351 CCCCAGAATC TCTCTGGGCC TGGTGGTCGG GAGAGGGACT ACATTGCACC 

401 ATGGGAAAGA GAGAGAAGGG ATGCCAGCGA AGAGACAAGC ACTTCCGTCA 

451 TGCAGAAAAC CCCCATCATC CTCTCAAAAC CTCCAGCAGA GCGGTCAAAA 

501 CAGCCACCAC CTCCAACAGC CCCTGCTGCC CCGCCTGCTC CAGCCCCTCT 

551 GGAGAAGCCC ATCGTTCTCA TGAAGCCACG GGAGGAGGGG AAGGGGCCTG 

601 TGGCCGTGAC AGGTGCCTCT ACCCCTGAGG GCACCGCCCC ACCACCCCCT 

651 GCAGCCCCTG CGCCACCCAA GGGGGAGAAG GAGGGGCAGA GACCCACACA 

701 GCCTGTGTAC CAGATCCAGA ACCGGGGCAT GGGCACTGCC GCACCAGCAG 

751 CCATGGACCC TGTCGTGGGT CAGGCCAAAC TACTGCCCCC AGAGCGCATG 

801 AAGCACAGCA TCAAGTTGGT GGATGACCAG ATGAATTGGT GTGACAGTGC 

851 CATCGAGTAC CTGTTGGATC AGACTGATGT GTTGGTGGTT GGTGTCCTGG 

901 GCCTCCAGGG GACAGGCAAG TCCATGGTCA TGTCATTGTT GTCAGCCAAC 

951 ACTCCAGAGG AGGACCAGAG GACTTATGTT TTCCGGGCCC AGAGCGCTGA 

1001 AATGAAGGAA CGAGGGGGCA ACCAGACCAG TGGCATCGAC TTCTTTATTA 

1051 CCCAAGAACG GATTGTTTTC CTGGACACAC AGCCCATCCT GAGCCCTTCT 

1101 ATCCTAGACC ATCTCATCAA TAATGACCGC AAACTGCCTC CAGAGTACAA 

1151 CCTTCCCCAC ACTTACGTTG AAATGCAGTC ACTCCAGATT GCTGCCTTCC 

1201 TTTTCACGGT CTGCCATGTG GTGATTGTTG TCCAGGACTG GTTCACAGAC 

1251 CTCAGTCTCT ACAGGTTCCT GCAGACAGCA GAGATGGTGA AGCCCTCCAC 

1301 CCCATCCCCC AGCCACGAGT CCAGCAGCTC ATCGGGCTCC GATGAAGGCA 

1351 CCGAGTACTA CCCCCACCTA GTCTTCTTGC AGAACAAAGC TCGCCGAGAG 

1401 GACTTCTGTC CTCGGAAGCT GCGGCAGATG CACCTGATGA TTGACCAGCT 

14 51 CATGGCCCAC TCCCACCTGC GTTACAAGGG AACTCTGTCC ATGTTACAAT 

1501 GCAATGTCTT CCCGGGGCTT CCACCTGACT TCCTGGACTC TGAGGTCAAC 

1551 TTATTCCTGG TACCCTTCAT GGACAGTGAA GCAGAGAGTG AAAACCCACC 

1601 AAGAGCAGGA CCTGGTTCCA GCCCACTCTT CTCCCTGCTG CCTGGGTATC 

1651 GTGGCCACCC CAGTTTCCAG TCCTTGGTGA GCAAGCTCCG GAGCCAAGTG 

1701 ATGTCCATGG CCCGGCCACA GCTGTCACAC ACGATCCTCA CCGAGAAGAA 

1751 CTGGTTCCAC TACGCTGCCC GGATCTGGGA TGGGGTGAGA AAGTCCTCTG 

1801 CTCTGGCAGA GTACAGCCGC CTGCTGGCCT GAGGCCAAGG AGAGGAATGT 

1851 CATGCAGGGG ACCTCCTGGG TCCGCAGTGT ACTGCGAGGG AGCACAGATG 

1901 TCCATCCCCC GCTGGGGTGG AGAGCGGCAG CAGGCCTGAT GGATGAGGGA 

1951 TCGTGGCTTC CCGGCCCAGA GACATGAGGT GTCCAGGGCC AGGCCCCCCA 
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2001 CCCTCAGTTG GGGCTGTTCC GGGGGTGACT GTGAGCGATC CCACCCCAAA 

2051 CCTGAGATGG GGTAGCCCGT CCTGTGTCCT CCACAGGGAC AAGCAGTGGG 

2101 AGGAGTCTGA ATGGTCACCA GGAAGCCCGG GCTCCATCTT GACCTCCTTT 

2151 TTCAGGGACA GGAGCAACAG GCCCCTCTTC CCTGACTCTA AGCCCTTCCC 

2201 TGTAAGGTGA GGCAGGGTCT GGAGAGCTCT TTATTGGAAC AGATCTGGTG 

2251 GTTCAAATAA ACACAGTCAT GCAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



Entry AC004780 from database EMBL: 

Homo sapiens chromosome 19, cosmid F17127, complete sequence. 
Score » 2616, P = 0.0e+00, identities = 524/525 
15 exons Bp 8031-31789 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 270 bp to 1829 bp; peptide length: 520 
Category: similarity to unknown protein 
Prosite motifs: ATP GTP A (211-219) 



1 MSESGHSQPG LYGIERRRRW KEPGSGGPQN LSGPGGRERD YIAPWERERR 
51 DASEETSTSV MQKTPIILSK PPAERSKQPP PPTAPAAPPA PAPLEKPIVL 
101 MKPREEGKGP VAVTGASTPE GTAPPPPAAP APPKGEKEGQ RPTQPVYQIQ 
151 NRGMGTAAPA AMDPWGQAK LLPPERMKHS IKLVDDQMNW CDSAIEYLLD 
201 QTDVLVVGVL GLQGTGKSMV MSLLSANTPE EDQRTYVFRA QSAEMKERGG 
251 NQTSGIDFFI TQERIVFLDT QPILSPSILD HLINNDRKLP PEYNLPHTYV 
301 EMQSLQIAAF LFTVCHVVIV VQDWFTDLSL YRFLQTAEMV KPSTPSPSHE 
351 SSSSSGSDEG TEYYPHLVFL QNKARREDFC PRKLRQMHLM IDQLMAHSHL 
401 RYKGTLSMLQ CNVFPGLPPD FLDSEVNLFL VPFMDSEAES ENPPRAGPGS 
451 SPLFSLLPGY RGHPSFQSLV SKLRSQVMSM ARPQLSHTIL TEKNWFHYAA 
501 RIWDGVRKSS ALAEYSRLLA 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_22hl3, frame 3 

TREMBL:AC004780_1 product: M F17127_1" ; Homo sapiens chromosome 19, 

cosmid F17127, complete sequence., N = 2, Score = 1264, P » 1.3e-23l 

TREMBL:CEY54E2A_1 gene: "Y54E2A.2"; Caenorhabditis elegans cosmid 
Y54E2A, N = 2, Score = 219, P = 1.4e-15 



>TREMBL:AC004780_1 product: "F17127_1 M ; Homo sapiens chromosome 19, cosmid 
F17127, complete sequence. 
Length » 528 



HSPs: 



Score = 1264 (189.6 bits), Expect = 1.3e-231, Sum P(2) = 1.3e-231 
Identities = 254/302 (84%), Positives » 264/302 (87%) 



Query: 4 6 ERERRDASEETSTSVMQKTPIILSKPPAERSKQPPPPTAPAAPPAPAPLEKPIVLMKPRE 105 

E+ER D+ + S +Q+T + R + P + A APLEKPIVLMKPRE 

Sbjct: 39 EKER-DSDSDFSP--LQQTEGCQRRDKHFRHAENPHHPLKTSSRA- APLEKPIVLMKPRE 94 

Query: 106 EGKGPVAVTGASTPEGTAPPPPAAPAPPKGEKEGQRPTQPVYQIQNRGMGTAAPAAMDPV 165 

EGKGPVAVTGASTPEGTAPPPPAAPAPPKGEKEGQRPTQPVYQIQNRGMGTAAPAAMDPV 

Sbjct: 95 EGKGPVAVTGASTPEGTAPPPPAAPAPPKGEKEGQRPTQPVYQIQNRGMGTAAPAAMDPV 154 

Query: 166 VGQAKLLPPERMKHSIKLVDDQMNWCDSAIEYLLDQTDVLVVGVLGLQGTGKSMVMSLLS 225 

VGQAKLLPPERMKHSIKLVDOQMNWCDSAIEYLLDQTDVLVVGVLGLQGTGKSMVMSLLS 

Sbjct: 155 VGQAKLLPPERMKHSIKLVDDQMNWCDSAIEYLLDQTDVLVVGVLGLQGTGKSMVMSLLS 214 
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Query: 


226 


Sb j ct : 


£ 10 


Query: 


286 


Sbjct: 


275 


Query: 


346 


Sbjct: 


335 


Score 


- 993 


Identities « 


Query: 


332 


Sbjct: 


340 


Query: 


392 


Sbjct: 


400 


Query: 


452 


Sbjct: 


460 


Query: 


512 


Sbjct: 


520 



ANTPEEDQRTYVFRAQSAEMKERGGNQTSGIDFFITQERIVFLDTQPILSPSILDHLINN 285 
ANTPEEDQRTYVFRAQSAEMKERGGNQTSGIDFFITQERIVFLDTQPILSPSILDHLINN 
ANTPEEDQRTYVFRAQSAEMKERGGNQTSGIDFFITQERIVFLDTQPILSPSILDHLINN 274 

DRKLPPEYNLPHTYVEMQSLQIAAFLFTVCHVVI VVQDWFTDLSLYRFLQTAEMVKPSTP 345 
DRKLPPEYNLPHTYVEMQSLQIAAFLFTVCHVVIVVQDWFTDLSLYR K ++ 

DRKLPPEYNLPHTYVEMQSLQIAAFLFTVCHVVIVVQDWFTDLSLYRLWDLGCKCKSNSH 334 

SP 347 
SP 

SP 336 

(149.0 bits), Expect - 1.3e-231, Sum P(2) - 1.3e-231 
* 189/189 (100%), Positives - 189/189 (100%) 

RFLQTAEMVKPSTPSPSHESSSSSGSDEGTEYYPHLVFLQNKARREDFCPRKLRQMHLMI 391 
RFLQTAEMVKPSTPSPSHESSSSSGSDEGTEYYPHLVFLQNKARREDFCPRKLRQMHLMI 
RFLQTAEMVKPSTPSPSHESSSSSGSDEGTEYYPHLVFLQNKARREDFCPRKLRQMHLMI 399 

DQLMAHSHLRYKGTLSMLQCNVFPGLPPDFLDSEVNLFLVPFMDSEAESENPPRAGPGSS 451 
DQLMAHSHLRYKGTLSMLQCNVFPGLPPDFLDSEVNLFLVPFMDSEAESENPPRAGPGSS 
DQLMAHSHLRYKGTLSMLQCNVFPGLPPDFLDSEVNLFLVPFMDSEAESENPPRAGPGSS 459 

PLFSLLPGYRGHPSFQSLVSKLRSQVMSMARPQLSHTILTEKNWFHYAARIWDGVRKSSA 511 
PLFSLLPGYRGHPSFQSLVSKLRSQVMSMARPQLSHTILTEKNWFHYAARIWDGVRKSSA 
PLFSLLPGYRGHPSFQSLVSKLRSQVMSMARPQLSHTILTEKNWFHYAARIWDGVRKSSA 519 

LAEYSRLLA 520 
LAEYSRLLA 
LAEYSRLLA 528 



Pedant information for DKFZphfbr2_22hl3, frame 3 
Report for DKFZphfbr2_22hl3.3 



(LENGTH) 520 

[MW] 57650.81 

[pi] 6.52 

[HOMOL] TREMBL:AC004780_1 product: "F17127_1 M ; Homo sapiens chromosome 19, cosmid 

F17127, complete sequence. 0.0 
[PROSITE] 
[PROSITE] 
[PROSITE) 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[KW] 



[KW] 



ATP_GTP_A 1 
MYRISTYL 8 
CAMPJPHOSPHO_S ITE 
CK2_PHOSPHO_SITE 
GL YCOS AMI NOGLYCAN 
PKC PHOSPHO SITE 
ASN^GL YCOS YL AT I ON 
TRANSMEMBRANE 1 
LOW COMPLEXITY 



1 
3 
2 

11.73 % 



SEQ MSESGHSQPGLYGI ERRRRWKEPGSGGPQNLSGPGGRERDYI APWERERRDASEETSTSV 

SEG 

PRD cccccccccccccccccccccccccccccccccccccceeeeehhhhhhhhhccccccee 

MEM 

SEQ MQKTPIILSKPPAERSKQPPPPTAPAAPPAPAPLEKPIVLMKPREEGKGPVAVTGASTPE 

SEG xxxxxxxxxxxxxxx 

PRD eeccceeecccccccccccccccccccccccccccceeeeeccccccccceeeecccccc 

MEM 

SEQ GTAPPPPAAPAPPKGEKEGQRPTQPVYQIQNRGMGTAAPAAMDPVVGQAKLLPPERMKHS 

SEG . .xxxxxxxxxxx 

PRD cccccccccccccccccccccccceeeeeeccccccccccccceeecceeecccchhhhh 

MEM 

SEQ IKLVDDQMNWCDSAIEYLLDQTDVLVVGVLGLQGTGKSMVMSLLSANTPEEDQRTYVFRA 

SEG xxxxxxxxxxxxxxxxxxx 

PRD hhhhcccchhhhhhhhhhccccceeeeeecccccccchhhhhhhhccccchhhhhheeee 

MEM 

SEQ QSAEMKERGGNQTSGIDFFITQERIVFLDTQPILSPSILDHLINNDRKLPPEYNLPHTYV 

SEG 

PRD hhhhhhhcccccceeeeeeeecceeeeeeccccccccccccccccccccccccccccchh 

MEM 

SEQ EMQSLQIAAFLFTVCHWIVVQDWFTDLSLYRFLQTAEMVKPSTPSPSHESSSSSGSDEG 

SEG xxxxxxxxxxxxxxxx. . . 
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PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



hhhhhhhhhhhhhhhheeeeeeeccchhhhhhhhhhhhhhhccccccccccccccccccc 
MMMMMMMMMMMMMMMMMMMMMMM 

TEYYPHLVFLQNKARREDFCPRKLRQMHLMIDQLMAHSHLRYKGTLSMLQCNVFPGLPPD 

cccccceeeehhhhhhhcccccchhhhhhhhhhhhhhhhhhccccccccccccccccccc 

FLDSEVNLFLVPFMDSEAESENPPRAGPGSSPLFSLLPGYRGHPSFQSLVSKLRSQVMSM 
chhhhhheeeeeccccccccccccccccccccceeeccccccccchhhhhhhhhhhhhhh 

ARPQLSHTILTEKNWFHYAARIWDGVRKSSALAEYSRLLA 
hhhhhhhheeeccchhhhhhhhhhhhcchhhhhhhhhccc 



Prosite for DKFZphfbr2_22hl3. 3 



PS00001 
PS00001 
PS00002 
PS00004 
PSOO005 
PS00005 
PS00005 
PS00006 
PS00006 
PS0O0O6 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS000O8 
PS000O8 
PS0O008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00017 



30->34 
251->255 

32->36 
507->511 
180->183 
215->218 
491->494 
117->121 
193->197 
228->232 
254->258 
277->281 
298->302 

355- >359 
436->440 

26->32 
139->145 
153->159 
211->217 
214->220 
249->255 

356- >362 
505->511 
211->219 



ASN_GL YC OS Y L AT I ON 

ASN_GLYCOSYLATION 

GLYCOSAMINOGLYCAN 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOS PHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

ATP GTP A 



PDOC00001 
PDOC00001 
PDOC00002 
PDOC0O004 
PDOC0O005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00017 



(No Pfam data available for DKFZphfbr2_22hl3 . 3) 
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DKFZphfbr2_22i4 



group: brain derived 

DKFZphfbr2_22i4 . 1 encodes a novel 228 amino acid protein with similarity to the N-terminus 
human p52rIPK. 

No informative BLAST results; no predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 

similarity to Human P52rIPK N-terminus 
complete cDNA, complete cds, few EST hits 

function of P52rIPK, repressor of p58lPK protein kinase inhibitor 
upstream regulator of interferon induced proteins 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 4748 bp 

Poly A stretch at pos. 4726, polyadenylation signal at pos. 4709 

1 TGGGTCCGGT CCTAGGGTCA CACCCACCGC AGGGTCTGGC TTGGTACAGT 
51 TGGGTGCATG CAGAAGTAGG TGGAGCTGCT GTTGCAGCCT TGAGAGAGTT 

101 TTATTGTAAA ACTCTTGTAA TTTATAGTAA TCGGAGGGGA AAACACCTCT 

151 TCCTTTTAAT TGCTCTGAGG ACCGCTGCCA AAGAAACGCA GTAGATCCGC 

201 TCCCTCTTGG GGGCGGGGAG AAAGAACGGG TTGTGTCCGC CATGTTGGTG 

251 AAGTCAAGCG AAGGCGACTA GAGCTCCAGG AGGGCCAGTT CTGTGGGCTC 

301 TAGTCGGCCA TATTAATAAA GAGAAAGGGA AGGCTGACCG TCCTTCGCCT 

351 CCGCCCCCAC ATACACACCC CTTCTTCCCA CTCCGCTCTC ACGACTAAGC 

401 TCTCACGATT AAGGCACGCC TGCCTCGATT GTCCAGCCTC TGCCAGAAGA 

451 AAGCTTAGCA GCCAGCGCCT CAGTAGAGAC CTAAGGGCGC TGAATGAGTG 

501 GGAAAGGGAA ATGCCGACCA ATTGCGCTGC GGCGGGCTGT GCCACTACCT 

551 ACAACAAGCA CATTAACATC AGCTTCCACA GGTTTCCTTT GGATCCTAAA 

601 AGAAGAAAAG AATGGGTTCG CCTGGTTAGG CGCAAAAATT TTGTGCCAGG 

651 AAAACACACT TTTCTTTGTT CAAAGCACTT TGAAGCCTCC TGTTTTGACC 

701 TAACAGGACA AACTCGACGA CTTAAAATGG ATGCTGTTCC AACCATTTTT 

751 GATTTTTGTA CCCATATAAA GTCTATGAAA CTCAAGTCAA GGAATCTTTT 

801 GAAGAAAAAC AACAGTTGTT CTCCAGCTGG ACCATCTAAT TTAAAATCAA 

851 ACATTAGTAG TCAGCAAGTA CTACTTGAAC ACAGCTATGC CTTTAGGAAT 

901 CCTATGGAGG CAAAAAAGAG GATCATTAAA CTGGAAAAAG AAATAGCAAG 

951 CTTAAGAAGA AAAATGAAAA CTTGCCTACA AAAGGAACGC AGAGCAACTC 
1001 GAAGATGGAT CAAAGCCACG TGTTTGGTAA AGAATTTAGA AGCAAATAGT 
1051 GTATTACCTA AAGGTACATC AGAACACATG TTACCAACTG CCTTAAGCAG 
1101 TCTTCCCTTG GAAGATTTTA AGATCCTTGA ACAAGATCAA CAAGATAAAA 
1151 CACTGCTAAG TCTAAATCTA AAACAGACCA AGAGTACCTT CATTTAAATT 
1201 TAGCTTGCAC AGAGCTTGAT GCCTATCCTT CATTCTTTTC AGAAGTAAAG 
1251 ATAATTATGG CACTTATGCC AAAATTCATT ATTTAATAAA GTTTTACTTG 
1301 AAGTAACATT ACTGAATTTG TGAAGACTTG ATTACAAAAG AATAAAAAAC 
1351 TTCATATGGA AATTTTATTT GAAAATGAGT GGAAGTGCCT TACATTAGAA 
1401 TTACGGACTT AAAAATTTTG CTAATAAATT GTGTGTTTGA AAGGTGTTTT 
1451 TTGTTTTTGT CTTTTTAAAC TACTGTTAAA AGAACAGCTT ATGATAAGTA 
1501 ATATGTTTAA CTTAGAGAAG AATTTTTTCC TGTACCAAAG TTGGCATATT 
1551 GCATTCTAAA TAAGATGCTA AATAAGAGTT AACCAACATT CAACATGACC 
1601 TTAAAACTGC TGGGTTTTGT ATTAATTAAA TTATAATTGG CACTGTGATT 
1651 TGAAAAATTT ATAGAAAAAA AGGTACAGGG CAAGTTTTTA AATTAAAACT 
1701 TTCTATATTT TGTTTTACCA GTAAAAGTGA GCTTATCATG GCCTCTCTCA 
1751 TAAGAATGAT TTTAAAATAG GTTGTAAAAT ATTTTGAAAA TATTTGAATG 
1801 TGAAGTACCA TTGAGTCATC CAAACTAGGT AAGGCCTCAA GTACTTTAAA 
1851 CTAGTAAAAT CTAGTAGCTG ATAATATTCA CCTAAGTAAG TGTTGTAAAA 
1901 TAATTCAGAG TTCAGGACCT AGCTTAGATA AATGTATACT ACTCTTTTTC 
1951 TCATAGTAAA AATCTTACAT TTCCAACTTC AAAATTGGTG CTTCCATATT 
2001 TGTTGATAAC CAAAACTCCT AAGGTTTTTT GTTTTCTTTT TAACTACTTT 
2051 CCAAATGCAT ACTATACCTC AGAAATAGTG TATCAATATA GTGGGCTTTT 
2101 TTTTTCCTCT TCATAAACCC ACAGTAAAAT TTAATCACAG GAAACTACTT 
2151 ATATCTTCAC ACTTTGTATT GATAACTTAA AATGGCATCA GTTTATCTTA 
2201 GACATCAGCT TGCTTTTTAT CTCCTTTTTT AGTGAGTGAA ATAGAGCAAC 
2251 TAGCATGCCT GTGTTCCCAG CTACTTGGGA GGCTAAGGTG GGAAGATCAA 
2301 TTGAACCTAG GAGGTTGAGG CTATAGTGAG CTGTGATTGC ACGACTGCAC 
2351 TCCAGCCTGG GCAATGGAGT GAGACTCCTG TCTCTAAAAC AGCAACAACA 
2401 AAAATAAAGC AACCATAGTG CATAAGGGAA ATTAAATGTT CCCTATAGAA 
2451 ATATGTGTAT GTCTGTGATA GTGGTATGCA AATGCTAATT ATTTTATAAA 
2501 ATAAAAGTTC AGAACTATTC TTATCATTGC CACTTGAACA ATTAAAGGGT 
2551 TTGCTTTATT TCACTAATGT TTAATAGGAA CCCTTTGCTT CAAACAGCTT 
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2601 TGTTGAAATC ATGTAAAAAT TTGTTAATAG AGAATCAAGT TATTTAACTC 
2651 AACTTATTTA ATTCAAGCTT GTGATACTAA CATACAAAGG TAGCATAAAC 
2701 CAAGTCATAA ATTGCTGTAA TCTTTCCTGT AGAGTAATAG CTACTTCATG 
2751 ATTTTTTTAA AAATTTCATT TTTTTGCTAT TTAGGATTGC ATTTGCTTGG 
2801 CTCCTAGTAA CAATTCTTTT ACAGTATTAG CACTCTCTTT ACTAAGGAAT 
2851 GCCTCCCAAG GAAATGCAAA GGTAGGAAAA GTCTCTTAGA ATGCCCATGA 
2901 GGTATTTAAA ACAGATATTT ATGAAAATCT TTTTGTGAAT GTTATAAATC 
2951 TTGCTAGTTA TTTTATCTTT ATCTTAAGTA TTAGATGTAG TTCCTTGGAA 
3001 TTGTCATTAC ATATTTATTT TTTTCTAGTG TGGTTTCAAA TAACTTTTTG 
3051 CCAACATATA ATCATCATCA AACATTCACT GACCATATCT ATTTTATAAC 
3101 TCAAAATAAG TTGGACAAAT AATCATTTTA ATAAAAACTA TTTTTTCCAA 
3151 GTATAACCAC TGTCATGTGG TTCACCCTTC ACCCCAGATA CAAAACACTT 
3201 ATTTGTGTAG CCCAGTTCCC ATCTACAGTA ATACCTTGAA ACCTTAATAA 
3251 ATTTTAAAAA TCATAAAAAT AAAATATTGT AAAATACAAC AAATTTTGGA 
3301 CAAGGTTACT TCATCTTCAT TCATTATTAC CTGACAGTAT TAAACTACTA 
3351 CTCAATAATT TTAGAGTAAA CTTTTCTGTG TTTTCCCCGT GATTTTCATT 
3401 GTGCTGTCCT GACAACATGC TCCAAACTCT TTGCATCAAA TTGTTTTATT 
3451 AACATACATT TGTCTACCTT AAAACTAGCT TTATTCACAG AGAAAGACCT 
3501 AAAAGGAGTC TATTAAAATG CTGCTTTCAG TTTGATAGTT TTTTTTTTAA 
3551 TCACTCTGAC CATAAACTAA CTGAAATTAT AATGGATTTT TTTTCCTCTC 
3601 CCGGTCACAA CACAGATCTT CTGTTCATTT GTTCTCTGTC TACTGGGCAC 
3651 CAACCTCTAC AAAGAACCAG CCAAAGGCTA GGTACTTGAT ATAAAAAGGA 
3701 ATATTACATT ATTTTCTGCC CTCAAGTTGC TCTATCTCCT GAAAGAAACA 
3751 AGTAATATTT ATAATACAAT ATGATAAATG CTACAAAAGA AATAGCTGTA 
3801 AAGTCCTTTG GTAAATGCTG TTGAATTGGA ATTCAGTAAG AACTATAAAC 
3851 TGTAGACCTT TTTATAATCA AATGCTTTTG TCTTGAAACA AAACAGATTC 
3901 CTCCTTATAT TGACTTAGCA AAGGAGGTAC AAGGACATTG GCATTTGACC 
3951 TGAATTATGG TGTTTTATTG AATGAGCTAT AAGACAACAT TTTTACCCTT 
4001 TAAAATGAAC ACTGAACAAA TGTGTTAATG GTATCTTTGT TAAAAGGAAA 
4051 ACATAGCTAT AAATAAAATA CTACATCGAA ATCCAGCACT GGAGTTCATT 
4101 TGAAATTTGA TATTTTGTGT AAAGTAACAA ACCTATTAAC ACAGATTTTT 
4151 AAAATAACTC AGAATCGTAT AAAGCACTTT GGTACTTATT TGTTCTCTTT 
4201 TCCCTTACAT TCTGTGTGGT AGGTGGTATT ATCTCTGATT TACACATGAA 
4251 GACATCCTTG TTAATGCAAT TTATTTATTC ATTCGGGCAT TTACTGTGTG 
4301 CCAACTTGCA AAAGGAATAG AAATGTCTGT GATCTAGATA GTTCTAGATT 
4351 GAACATAGAT TTTCTGCCAA CAAATCCTCT CTGCTGTTCA CATTATCCTT 
4401 TGTTTAACGT ATGAACCAGG TTACTAAAAT AGGATAAATC ATGTGTCTTA 
4451 GAATATGAAA ATAGTAAGGT CTTTGAGGTC ACTTGATCTT CTCTAAGTAG 
4501 ACTTTATAAT ATTGTGTTTT ATCTCATTTC TCAATATTAG AATACGGGTA 
4551 GATTTTAATT TTGCTATAAT ATAGGAAATG GTTCATCTTT GTACCAAAAT 
4 601 ATTGCATTCT TCTGATATTT AGACAGTTGG AAACTTTCTA AAATTGAGGA 
4651 TTTTGTAGTG TATACTAAAT AATTGCATAT TCAAAAAAAT GTATTCTGAG 
4701 TATGGTGATA TTAAACATTT TTCCCCAAAA AAAAAAAAAA AAAAAAAA 



BLAST Results 

No BLAST result 

Medline entries 



98107671: 

Regulation of interferon-induced protein kinase PKR: 
modulation of P58IPK inhibitory function by a novel protein, 
P52rIPK 



Peptide information for frame 1 



ORF from 511 bp to 1194 bp; peptide length: 228 
Category: similarity to known protein 



1 MPTNCAAAGC ATTYNKHINI SFHRFPLDPK RRKEWVRLVR RKNFVPGKHT 
51 FLCSKHFEAS CFDLTGQTRR LKMDAVPTIF DFCTHIKSMK LKSRNLLKKN 
101 NSCSPAGPSN LKSNISSQQV LLEHSYAFRN PMEAKKRIIK LEKEIASLRR 
151 KMKTCLQKER RATRRWIKAT CLVKNLEANS VLPKGTSEHM LPTALSSLPL 
201 EDFKILEQDQ QDKTLLSLNL KQTKSTFI 

BLASTP hits 

Entry AF007393_1 from database TREMBL: 

product: "P52rIPK"; Homo sapiens P52rlPK mRNA, complete cds. 
Score - 166, P « 2.5e-ll, identities - 40/106, positives = 56/106 



160 



WO 01/12659 



PCT/IBOO/01496 



Alert BLASTP hits for DKFZphf br2_22i4 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_22i4, frame 1 



Report for DKFZphfbr2_22i4 . 1 



[LENGTH] 228 

[MW] 26259.94 

tpl] 10.17 

[HOMOL] TREMBL:AF007393_1 product: "P52rIPK"; Homo sapiens P52rIPK mRNA, complete 
le-09 

(PROSITE) MYRISTYL 1 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 4 

[PROSITE] ASN_GLYCOS YLAT ION 3 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 7.02 % 



SEQ MPTNCAAAGCATTYNKHINISFHRFPLDPKRRKEWVRLVRRKNFVPGKHTFLCSKHFEAS 

SEG 

PRD cccccccccccccccccccceeeecccccchhhhhhhhhhhhhcccccceeehhhhhhhh 

SEQ CFDLTGQTRRLKMDAVPTI FDFCTHIKSMKLKSRNLLKKNNSCSPAGPSNLKSNISSQQV 

SEG xxxxxxxxxxxxxxxx 

PRD cccccccccccccccccceeeeccccchhhhhhhhhhhccccccccccccccccccchhh 

SEQ LLEHSYAFRNPMEAKKRIIKLEKEIASLRRKMKTCLQKERRATRRWIKATCLVKNLEANS 

SEG 

PRD hhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeecccccc 

SEQ VLPKGTSEHMLPTALSSLPLEDFKILEQDQQDKTLLSLNLKQTKSTFI 

SEG 

PRD cccccccccccccccccccccchhhhhhcccccccccccccccccccc 



Prosite for DKFZphfbr2_22i4 . 1 



PS00001 


19->23 


PS00001 


100->104 


PS00001 


114-M18 


PS00004 


160->164 


PS00005 


68->71 


PS00005 


88->91 


PS00005 


147->150 


PS00005 


163->166 


PS00006 


60->64 


PS00006 


78->82 


PS00008 


9->15 



ASN_GLYCOS YLAT I ON 

ASN_GLYCOS YLAT I ON 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC000O4 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC0O008 



(No Pfam data available for DKFZphf br2_22i4 . 1) 
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DKFZphfbr2_22k3 
group: brain derived 

DKF2phfbr2_22k3 encodes a novel 538 amino acid protein with weak similarity to extensins. 

No informative BLAST results; no predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

weak similarity to extensins 

complete cDNA, complete cda, few EST hits 
CpG Island in 5 f UTR complete cDNA 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2775 bp 

Poly A stretch at pos. 2755, polyadenylation signal at pos. 2718 

1 GGGGCTGCCC GCGCGCTCCA CGGTGCAGAG CTCTAAGCGC GCGGGCTGGC 

51 AGGCTGCGGC GCGTCAAGGT CAGCCTGGAG CTGGGTGGCG GCCTGCCTGG 

101 GGGCGGGGGA CCCTACTGGA GGCCCGGGCT GGGGCCTCCC AGCGCCTCGG 

151 CCATATTGAA TAGCTTCGAC TGGACCGTCT TTGTCTGCGA AGTCCTGTCC 

201 CAAGTTCCAG CCGCGTCCCT GGGGCCTGGG GCAGGAAGAG TCGCTGGCAG 

251 CCCGCGCGCC CCAACTTGGA GCTGGGACAC CACGTTTCCA GCTTGGAGTG 

301 GGCCTTGAGC CTTGGGACTG ACCTCGCCCC CGGCTCACGT AGGCATCCTG 

351 GAAATTGATT CCCCCAAGTC CTTGGTGGGG GAGCCGGACT TGGTCAAGAC 

401 TGTACTTGTT GCAGGCGAAG AGATTGGAGG CGTTTGGCTC GTCCCTGGCT 

451 AGGGAGGTGA GACTCTCCGG TCAGCGTTGC TGGAACTCCC CCCATCCAGT 

501 CCCTCCCTCA AGACTAAGGG CTACAGTAGT TTGTTGGGGC TCATTGCCCC 

551 CTCACCCCAG ATATCACCCT GGAGATCTTA AAGACTCTCG AGAAAAGCCA 

601 CGTGGGGGGC TGGTTCCCCT GGGGCTTCCT GCCGTCCCCC GACTGCCTCA 

651 TTCTTTGGAG CGTCCCCGAT GTCTGCAAAG ATGTGGATTT GGACGTCCTC ) 

701 GTGGAAGCCC TAAAGCCCGT GGGGACATTT AAGAAGATCG GCAAGGTGTT 

751 CCGCAAGGAG GAGGACTCCA CGGTGGGGAT GCTGCAGATC GGGGAGGACG 

801 TCGACTATTT GCTCATCCCC CGGGAGGTCA GGCTGGCTGG GGGCGTCTGG 

851 AGAGTCATCT CTAAGCCCGC CACCAAGGAA GCAGAATTTC GGGAGCGGCT 

901 GACCCAGTTC CTGGAAGAAG AGGGCCGCAC CCTGGAGGAC GTGGCCCGCA 

951 TCATGGAGAA GAGCACCCCG CACCCGCCCC AGCCCCCCAA AAAGCCCAAG 

1001 GAGCCCCGAG TGAGGAGGAG AGTGCAGCAG ATGGTGACTC CTCCGCCCCG 

1051 GCTGGTCGTG GGCACGTACG ACAGCAGCAA CGCCAGCGAC AGCGAGTTCA 

1101 GCGACTTCGA GACCTCCAGA GACAAGAGCC GCCAGGGCCC GCGGCGGGGC 

1151 AAGAAGGTGC GCAAAATGCC CGTCAGCTAC CTGGGCAGCA AGTTCCTGGG 

1201 AAGCGACCTG GAGAGTGAGG ATGATGAGGA ACTGGTCGAG GCCTTCCTCC 

1251 GGCGACAGGA GAAGCAGCCC AGCGCGCCGC CTGCCCGCCG CCGCGTCAAC 

1301 CTGCCAGTGC CCATGTTTGA GGACAACCTG GGGCCTCAGC TGTCCAAAGC 

1351 GGACAGGTGG CGGGAGTATG TCAGCCAGGT GTCCTGGGGG AAGCTGAAGC 

1401 GGAGGGTGAA GGGTTGGGCG CCGAGGGCGG GCCCCGGGGT GGGCGAGGCC 

1451 CGGCTGGCCT CCACCGCAGT GGAGAGCGCA GGGGTATCAT CGGCGCCAGA 

1501 GGGCACCAGC CCGGGGGATC GCTTGGGAAA CGCGGGAGAT GTTTGTGTGC 

1551 CCCAGGCTTC CCCTAGGCGA TGGAGGCCCA AGATCAACTG GGCCTCCTTT 

1601 CGGCGCCGCA GGAAGGAGCA GACAGCACCC ACAGGTCAGG GGGCAGACAT 

1651 CGAGGCTGAT CAGGGGGGAG AGGCTGCAGA TAGTCAAAGG GAAGAGGCCA 

1701 TAGCTGACCA GCGGGAAGGG GCTGCAGGTA ATCAGAGGGC TGGGGCCCCA 

1751 GCTGACCAGG GGGCAGAGGC TGCAGATAAT CAGAGGGAAG AGGCTGCAGA 

1801 TAATCAGAGG GCAGGGGCCC CAGCTGAGGA GGGGGCAGAG GCTGCAGATA 

1851 ACCAGAGGGA AGAGGCTGCA GATAATCAGA GGGCAGAGGC CCCAGCTGAC 

1901 CAGAGGTCAC AGGGCACAGA TAACCACAGG GAAGAGGCTG CAGATAATCA 

1951 GAGGGCGGAG GCCCCAGCTG ACCAGGGGTC AGAGGTTACA GATAATCAAA 

2001 GGGAAGAGGC CGTACATGAC CAGAGGGAAA GGGCCCCAGC TGTCCAGGGT 

2051 GCAGATAATC AGAGGGCACA GGCCCGGGCT GGCCAGAGGG CAGAGGCTGC 

2101 ACATAATCAG AGGGCAGGGG CCCCAGGTAT CCAGGAAGCT GAAGTCTCAG 

2151 CTGCCCAAGG GACCACAGGA ACAGCTCCAG GAGCCAGGGC CCGGAAACAG 

2201 GTCAAGACAG TGAGGTTCCA GACCCCTGGA CGCTTTTCGT GGTTTTGCAA 

2251 GCGCCGGAGA GCCTTCTGGC ACACTCCCCG GTTGCCAACC CTGCCCAAGA 

2301 GAGTCCCCAG GGCAGGAGAG GTCAGGAACC TCAGGGTGCT GAGGGCCGAG 

2351 GCCAGAGCAG AAGCTGAGCA GGGAGAGCAA GAAGACCAGC TGTGAGGTGA 

2401 GGGCTAGAGA CAGCCCACGG GCCCTCCCTC CAAGTGTGGG AGGGAGAGAT 

2451 GCTCTGCCTC TGAACTTCAA AGTGGAGGTG GAGTGCTGGC CACGTCTCCA 

2501 CCTAACAACC CTCTTTATTC TCTTGTTAAA GTTTTGTTCA TGCTTTGATT 

2551 TTTTTTTAAA TTTTTTAGAG ACAGGGTCTC ACTCTGTTGC CCAGGCTGGA 

2601 GTGCAGTGGC ATGATCATAA CTCACTGCAG CCTCAAACTT CTGGCCTCAA 

2651 GTGATCCTCC TGCCTCGGCC TCCCAAAATG CTGGGATTAC AGATGTGAGC 
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2701 CACCACACAC ACCATCTGAT TAAAAAAAAA AAATACTGAT TCCCTGTAGC 
2751 AACCCAAAAA AAAAAAAAAA AAAAA 



BLAST Results 



Entry HS164A7F from database EMBL: 

H. sapiens CpG island DNA genomic Msel fragment, clone 164a7, forward 
read cpgl64a7 . f tla . 
Score * 740, P = 3.0e-25, identities - 150/151 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 779 bp to 2392 bp; peptide length: 538 
Category: similarity to known protein 



1 MLQIGEDVDY LLIPREVRLA GGVWRVISKP ATKEAEFRER LTQFLEEEGR 
51 TLEDVARIME KSTPHPPQPP KKPKEPRVRR RVQQMVTPPP RLVVGTYDSS 
101 NASDSEFSDF ETSRDKSRQG PRRGKKVRKM PVSYLGSKFL GSDLESEDDE 
151 ELVEAFLRRQ EKQPSAPPAR RRVNLPVPMF EDNLGPQLSK ADRWREYVSQ 
201 VSWGKLKRRV KGWAPRAGPG VGEARLASTA VESAGVSSAP EGTSPGDRLG 
251 NAGDVCVPQA SPRRWRPKIN WASFRRRRKE QTAPTGQGAD IEADQGGEAA 
301 DSQREEAIAD QREGAAGNQR AGAPADQGAE AADNQREEAA DNQRAGAPAE 
351 EGAEAADNQR EEAADNQRAE APADQRSQGT DNHREEAADN QRAEAPADQG 
401 SEVTDNQREE AVHDQRERAP AVQGADKQRA QARAGQRAEA AHNQRAGAPG 
451 IQEAEVSAAQ GTTGTAPGAR ARKQVKTVRF QTPGRFSWFC KRRRAFWHTP 
501 RLPTLPKRVP RAGEVRNLRV LRAEARAEAE QGEQEDQL 

BLASTP hits 

Entry RNU67136_1 from database TREMBL: 

"A-kinase anchoring protein AKAP150"; Rattus norvegicus 
A-kinase anchoring protein AKAP150 mRNA, complete cds. Rattus 
norvegicus (Norway rat) 
Length = 714 

Score = 182 (64.1 bits), Expect = 1.2e-10, P = 1.2e-10 
Identities = 73/257 (28%), Positives = 104/257 (40%) 



Alert BLASTP hits for DKFZphfbr2_22k3, frame 2 

TREMBL : PFSANTY_1 product: "S-antigen"; Plasmodium falciparum KF1916 

S-antigen gene, complete cds., N = 1, Score « 178, P = 3.7e-ll 



>TREMBL:PFSANTY_1 product: "S-antigen"; Plasmodium falciparum KF1916 
S-antigen gene, complete cds. 
Length = 285 

HSPs: 

Score « 178 (26.7 bits), Expect = 3.7e-ll, P = 3.7e-ll 
Identities « 60/217 (27%), Positives = 97/217 (44%) . 

Query: 269 INWASFRRRRKEQTAPTGQGA-DIEADQGGEAADSQRE-EAIADQ REGAAGNQRAGA 323 

+N + + + E G+G D E E +D+ E E I Q E A N+ AG+ 

Sbjct: 47 LNGKNGKGNKYEDLQEEGEGENDDEEHSNSEESDNDEENEIIVGQDGSNEKAGSNEEAGS 106 

Query: 324 PADQGAEAADNQREEAADNQRAGAPAEEGA—EAADNQR EEAADNQRAEAPADQRS 377 

G+ E+A N++AG+ E G+ EA N+ EEA N++A + S 

Sbjct: 107 NEKAGSNEEAGSNEKAGSNEKAGSNEEAGSNEEAGSNEEAGSNEEAGSNEKAGSNEKAGS 166 

Query: 378 QGTDNHREEAADNQRAEAPADQGSEVTDNQREEAVHDQRERAPAVQGADNQRAQAR — AG 435 

EEA N++A + + GS E+A +++ + G+ N++A + AG 

Sbjct: 167 NEKAGSNEEAGSNEKAGSNEEAGSNEKAGSNEKAGSNEKAGSNEEAGS-NEKAGSNEEAG 225 

Query: 4 36 QRAEAAHNQRAGA PGI QEAEVSAAQGTTGTA- PGA 4 69 
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EA N+ AG+ G E + +G GT PG+ 
Sbjct: 226 SNEEAGSNEEAGSNEEAGSNEGSEAGTEGPKGTGGPGS 263 

Score « 173 (26.0 bits), Expect = 1.5e-10, P = l.Se-10 
Identities = 51/190 (26%), Positives - 83/190 (43%) 



Query : 






337 






+E GQ G++ +A EA +++ A E A N++AG+ G+ E 




Sbjct: 


83 


EENEI I VGQDGSNEKAGSNEEAGSNEK AGSNEEAGSNEKAGSNEKAGSNEEAGSNE 


138 


Query: 


338 


EAADNQKAGArAEEGAEAADNQKEEAADNQRAEAPADQRbyt 


397 






EA N+ AG+ E G+ E+A N++A + + S EEA N++A + 




Sbjct: 


139 


EAGSNEEAGSNEEAGSNEKAGSNEKAGSNEKAGSNEEAGSNEKAGSNEEAGSNEKAGSNE 


198 


Query: 


398 


DQGSEVTDNQREEAVHDQRERAPAVQGADNQRAQARAGQRAEAAHNQRAGAPGIQEAEVS 


457 






GS EEA +++ + G++ + AG EA N+ AG+ EA 




Sbjct: 


199 


KAGSNEKAGSNEEAGSNEKAGSNEEAGSNEE AGSNEEAGSNEEAGSNEGSEAGTE 


253 


Query: 


458 


AAQGTTGTAPG 4 68 








+GT G G 




Sbjct: 


254 


GPKGTGGPGSG 264 




Score 


= 147 


(22.1 bits), Expect - 1.6e-07, p - 1.6e-07 




Identities = 


= 40/168 (23%), Positives = 70/168 (41%) 




Query: 


288 


GADI EADQGGEAADSQR — EEAIADQREGAAGNQRAGAPADQGAEAADNQREEAADNQRA 


345 






G++ EA +A +++ A E A N+ AG+ + G+ E+A N++A 




Sbjct: 


111 


GSNEEAGSNEKAGSNEKAGSNEEAGSNEEAGSNEEAGSNEEAGSNEKAGSNEKAGSNEKA 


170 


Query: 


346 


GAPAEEGAEAADNQREEAADNQRAEAPADQRSQGTDNHREEAADNQRAEAPADQGSEVTD 


405 






G+ E G+ EEA N++A + S EEA N++A + + GS 




Sbjct: 


171 


GSNEEAGSNEKAGSNEEAGSNEKAGSNEKAGSNEKAGSNEEAGSNEKAGSNEEAGSNEEA 


230 


Query: 


406 


NQREEAVHDQR — ERAPAVQGADNQRAQARAGQRAEAAHNQRAGAPGI 451 








EEA ++ + G + + G E +HN++ I 




Sbjct: 


231 


GSNEEAGSNEEAGSNEGSEAGTEGPKGTGGPGSGGEHSHNKKKSKKSI 278 




Score 


= 101 


(15.2 bits). Expect = 2.5e-02, P = 2.4e-02 




Identities = 26/100 (26%), Positives - 47/100 (47%) 




Query: 


281 


QTAPTGQGADIEADQGGEAADSQREEAIADQREGAAGNQRAGAPADQGAEAADNQREEAA 


340 






+ A + + A + G EEA ++++ G+ N++AG+ G+ E+A 




Sbjct: 


162 


EKAGSNEKAGSNEEAGSNEKAGSNEEAGSNEKAGS— NEKAGSNEKAGSNEEAGSNEKAG 


219 


Query: 


341 


DNQRAGAPAEEGAEAADNQREEAADNQRAEAPADQRSQGT 380 








N+ AG+ E G+ EEA N+ +EA + +GT 




Sbjct: 


220 


SNEEAGSNEEAGSNEEAGSNEEAGSNEGSEA-GTEGPKGT 258 





Pedant information for DKFZphfbr2_22k3, frame 2 



Report for DKFZphfbr2_22k3.2 



[LENGTH] 

tMW] 

tpl] 

[HOMOL] 

Homo sapiens 

t PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KM] 

[KW] 



538 

59402.19 
8.72 

TREMBL:AF037364_1 gene: "MAI"; product: 
paraneoplastic neuronal antigen MAI (MAI ) 
AMI DAT I ON 1 
MYRISTYL 12 
CK2 PHOSPHO_SITE 11 
PKC"PHOSPHO_SITE 6 
ASN~GLYCOSYLATION 1 
All_Alpha 

LOW COMPLEXITY 18.03 % 



"paraneoplastic neuronal antigen MAI" 
mRNA, complete cds . 4e-10 



SEQ MLQIGEDVDYLLI PREVRLAGGVWRVISKPATKEAEFRERLTQFLEEEGRTLEDVARIME 

SEG 

PRD cccccccccccccccccccccceeeeeeecccchhhhhhhhhhhhhhhccchhhhhhhhh 

SEQ KSTPHPPQPPKKPKEPRVRRRVQQMVTPPPRLVVGTYDSSNASDSEFSDFETSRDKSRQG 

SEG xxxxxxxxxxxxxxxxxxx 

PRD hcccccccccccccccchhhhhhhhhccccceeeeecccccccccccccccccccccccc 

SEQ PRRGKKVRKMPVSYLGSKFLGSDLESEDDEELVEAFLRRQEKQPSAPPARRRVNLPVPMF 

SEG xxxxxxxxxxx 

PRD ccccccccccceeeccccccccccccchhhhhhhhhhhhhhccccccchhhhhccccccc 
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SEQ EDNLGPQLSKADRWREYVSQVSWGKLKRRVKGWAPRAGPGVGEARLASTAVESAGVSSAP 

SEG 

PRD cccccccchhhhhhhhhheeeeccchhhhhhccccccccccchhhhhhhhhhhccccccc 

SEQ EGTSPGDRLGNAGDVCVPQASPRRWRPKINWASFRRRRKEQTAPTGQGADIEADQGGEAA 

SEG 

PRD cccccccccccccceeeecccccccccccchhhhhhhhhhhhhcccccchhhhhccchhh 

SEQ DSQREEAIADQREGAAGNQRAGAPADQGAEAADNQREEAADNQRAGAPAEEGAEAADNQR 

SEG xxxxxxxxxxxxx xxxxxxxxxxxx . . . . 

PRD hhhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhccccchhhhhhhhhhh 

SEQ EEAADNQRAEAPADQRSQGTDNHREEAADNQRAEAPADQGSEVTDNQREEAVHDQRERAP 

SEG 

PRD hhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ AVQGADNQRAQARAGQRAEAAHNQRAGAPGIQEAEVSAAQGTTGTAPGARARKQVKTVRF 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxxxx 

PRD hhccchhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhccccccccchhhhhhhhhhh 

SEQ QTPGRFSWFCKRRRAFWHTPRLPTLPKRVPRAGEVRNLRVLRAEARAEAEQGEQEDQL 

SEG xxxxxxxxxxxxxx. . . 

PRD cccccceeehhhhhhhccccccccccccccccccchhhhhhhhhhhhhhhhhhhhccc 



Prosite for DKFZphf br2_22k3 . 2 



PS00001 


101- 


■>105 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


112- 


■>115 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


261- 


->264 


PKC"PHOSPHO" 


"SITE 


PDOC0O005 


PS00005 


273->276 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


302- 


>305 


PKC~PHOSPHO" 


'site 


PDOC00005 


PS00005 


477->480 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


499- 


->502 


PKC"PHOSPHO" 


'site 


PDOC00005 


PS00006 


51->55 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


103- 


•>107 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


108- 


•>112 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


112- 


•>116 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


142- 


•>146 


CK2~PHOSPHO~ 


"site 


PDOC00006 


PS00006 


146- 


•>150 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


189- 


•>193 


CK2~PHOSPHO~ 


"site 


PDOC00006 


PS00006 


229- 


•>233 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


238->242 


CK2 PHOSPHO" 


site 


PDOC00006 


PS00006 


244- 


■>248 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


302- 


•>306 


CK2 PHOSPHO" 


"site 


PDOC00006 


psooooa 


95->101 


MYRISTYL 




PDOC00008 


PS0OO08 


220- 


■>226 


MYRISTYL 




PDOC00008 


PS00008 


242- 


•>248 


MYRISTYL 




PDOC00008 


PS00008 


296- 


■>302 


MYRISTYL 




PDOC00008 


PS00008 


314- 


■>320 


MYRISTYL 




PDOC00008 


PS00008 


317- 


->323 


MYRISTYL 




PDOC00008 


PS00008 


328- 


■>334 


MYRISTYL 




PDOC00008 


PS00008 


352- 


•>358 


MYRISTYL 




PDOC00008 


PS0OO08 


400- 


•>406 


MYRISTYL 




PDOC00008 


PS00008 


450- 


•>456 


MYRISTYL 




PDOC00008 


PS00008 


461- 


>467 


MYRISTYL 




PDOC00008 


PS0O008 


464- 


•>470 


MYRISTYL 




PDOC00008 


PS00009 


123- 


>127 


AMI DAT I ON 




PDOC00009 



(No Pfam data available for DKFZphfbr2_22k3 . 2) 
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DKFZphfbr2_22k8 



group: brain derived 

DKFZphfbr2_22k8 encodes a novel 172 amino acid protein without similarity to known proteins. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus: /raap="7" 

Insert length: 2789 bp 

Poly A stretch at pos. 2769, polyadenylation signal at pos. 2756 



1 GGGGGAGCCA TGAGGCGCCA GCCTGCGAAG GTGGCGGCGC TGCTGCTCGG 
51 GCTGCTCTTG GAGTGCACAG AAGCCAAAAA GCATTGCTGG TATTTCGAAG 
101 GACTCTATCC AACCTATTAT ATATGCCGCT CCTACGAGGA CTGCTGTGGC 
151 TCCAGGTGCT GTGTGCGGGC CCTCTCCATA CAGAGGCTGT GGTACTTCTG 
201 GTTCCTTCTG ATGATGGGCG TGCTTTTCTG CTGCGGAGCC GGCTTCTTCA 
251 TCCGGAGGCG CATGTACCCC CCGCCGCTGA TCGAGGAGCC AGCCTTCAAT 
301 GTGTCCTACA CCAGGCAGCC CCCAAATCCC GGCCCAGGAG CCCAGCAGCC 
351 GGGGCCGCCC TATTACACTG ACCCAGGAGG ACCGGGGATG AACCCTGTCG 
4 01 GGAATTCCAC GGCAATGGCT TTCCAGGTCC CACCCAACTC ACCCCAGGGG 
451 AGTGTGGCCT GCCCGCCCCC TCCAGCCTAC TGCAACACGC CTCCGCCCCC 
501 GTACGAACAG GTAGTGAAGG CCAAGTAGTG GGGTGCCCAC GTGCAAGAGG 
551 AGAGACAGGA GAGGGCCTTT CCCTGGCCTT TCTGTCTTCG TTGATGTTCA 
601 CTTCCAGGAA CGGTCTCGTG GGCTGCTAAG GGCAGTTCCT CTGATATCCT 
651 CACAGCAAGC ACAGCTCTCT TTCAGGCTTT CCATGGAGTA CAATATATGA 
701 ACTCACACTT TGTCTCCTCT GTTGCTTCTG TTTCTGACGC AGTCTGTGCT 
751 CTCACATGGT AGTGTGGTGA CAGTCCCCGA GGGCTGACGT CCTTACGGTG 
801 GCGTGACCAG ATCTACAGGA GAGAGACTGA GAGGAAGAAG GCAGTGCTGG 
851 AGGTGCAGGT GGCATGTAGA GGGGCCAGGC CGAGCATCCC AGGCAAGCAT 
901 CCTTCTGCCC GGGTATTAAT AGGAAGCCCC ATGCCGGGCG GCTCAGCCGA 
951 TGAAGCAGCA GCCGACTGAG CTGAGCCCAG CAGGTCATCT GCTCCAGCCT 
1001 GTCCTCTCGT CAGCCTTCCT CTTCCAGAAG CTGTTGGAGA GACATTCAGG 
1051 AGAGAGCAAG CCCCTTGTCA TGTTTCTGTC TCTGTTCATA TCCTAAAGAT 
1101 AGACTTCTCC TGCACCGCCA GGGAAGGATA GCACGTGCAG CTCTCACCGC 
1151 AGGATGGGGC CTAGAATCAG GCTTGCCTTG GAGGCCTGAC AGTGATCTGA 
1201 CATCCACTAA GCAAATTTAT TTAAATTCAT GGGAAATCAC TTCCTGCCCC 
1251 AAACTGAGAC ATTGCATTTT GTGAGCTCTT GGTCTGATTT GGAGAAAGGA 
1301 CTGTTACCCA TTTTTTTGGT GTGTTTATGG AAGTGCATGT AGAGCGTCCT 
1351 GCCCTTTGAA ATCAGACTGG GTGTGTGTCT TCCCTGGACA TCACTGCCTC 
1401 TCCAGGGCAT TCTCAGGCCC GGGGGTCTCC TTCCCTCAGG CAGCTCCAGT 
1451 GGTGGGTTCT GAAGGGTGCT TTCAAAACGG GGCACATCTG GCCGGGAAGT 
1501 CACATGGACT CTTCCAGGGA GAGAGACCAG CTGAGGCGTC TCTCTCTGAG 
1551 GTTGTGTTGG GTCTAAGCGG GTGTGTGCTG GGCTCCAAGG AGGAGGAGCT 
1601 TGCTGGGAAA AGACAGGAGA AGTACTGACT CAACTGCACT GACCATGTTG 
1651 TCATAATTAG AATAAAGAAG AAGTGGTCGG AAATGCACAT TCCTGGATAG 
1701 GAATCACAGC TCACCCCAGG ATCTCACAGG TAGTCTCCTG AGTAGTTGAC 
1751 GGCTAGCGGG GAGCTAGTTC CGCCGCATAG TTATAGTGTT GATGTGTGAA 
1801 CGCTGACCTG TCCTGTGTGC TAAGAGCTAT GCAGCTTAGC TGAGGCGCCT 
1851 AGATTACTAG ATGTGCTGTA TCACGGGGAA TGAGGTGGGG GTGCTTATTT 
1901 TTTAATGAAC TAATCAGAGC CTCTTGAGAA ATTGTTACTC ATTGAACTGG 
1951 AGCATCAAGA CATCTCATGG AAGTGGATAC GGAGTGATTT GGTGTCCATG 
2001 CTTTTCACTC TGAGGACATT TAATCGGAGA ACCTCCTGGG GAATTTTGTG 
2051 GGAGACACTT GGGAACAAAA CAGACACCCT GGGAATGCAG TTGCAAGCAC 
2101 AGATGCTGCC ACCAGTGTCT CTGACCACCC TGGTGTGACT GCTGACTGCC 
2151 AGCGTGGTAC CTCCCATGCT GCAGGCCTCC ATCTAAATGA GACAACAAAG 
2201 CACAATGTTC ACTGTTTACA ACCAAGACAA CTGCGTGGGT CCAAACACTC 
2251 CTCTTCCTCC AGGTCATTTG TTTTGCATTT TTAATGTCTT TATTTTTTGT 
2301 AATGAAAAAG CACACTAAGC TGCCCCTGGA ATCGGGTGCA GCTGAATAGG 
2351 CACCCAAAAG TCCGTGACTA AATTCCGTTT GTCTTTTTGA TAGCAAATTA 
2401 TGTTAAGAGA CAGTGATGGC TAGGGCTCAA CAATTTTGTA TTCCCATGTT 
2451 TGTGTGAGAC AGAGTTTGTT TTCCCTTGAA CTTGGTTAGA ATTGTGCTAC 
2501 TGTGAACGCT GATCCTGCAT ATGGAAGTCC CACTTTGGTG ACATTTCCTG 
2551 GCCATTCTTG TTTCCATTGT GTGGATGGTG GGTTGTGCCC ACTTCCTGGA 
2601 GTGAGACAGC TCCTGGTGTG TAGAATTCCC GGAGCGTCCG TGGTTCAGAG 
2651 TAAACTTGAA GCAGATCTGT GCATGCTTTT CCTCTGCAGC AATTGGCTCG 
2701 TTTCTCTTTT TTGTTCTCTT TTGATAGGAT CCTGTTTCCT ATGTGTGCAA 
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2751 AATAAAAATA AATTTGGGCA AAAAAAAAAA AAAAAAAAA 

BLAST Results 



Entry HS671255 from database EMBL: 
human STS SHGC-11828. 
Length » 400 
MinU3 Strand HSPs: 

Score = 1822 (273.4 bits), Expect = 4.8e-76, P « 4.8e-76 
Identities = 382/397 (96%), Positives « 382/397 (96%), 



Medline entries 

No Medline entry 



Peptide information for frame 1 



ORF from 10 bp to 525 bp; peptide length: 172 
Category: putative protein 
Classification: unset 



1 MRRQPAKVAA LLLGLLLECT EAKKHCWYFE GLYPTYYICR SYEDCCGSRC 
51 CVRALSIQRL WYFWFLLMMG VLFCCGAGFF IRRRMYPPPL IEEPAFNVSY 
101 TRQPPNPGPG AQQPGPPYYT DPGGPGMNPV GNSTAMAFQV PPNSPQGSVA 
151 CPPPPAYCNT PPPPYEQVVK AK 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_22k8 , frame 1 

PIR:S14970 extensin class I (clone wl7-l) - tomato, N = 1, Score » 118, 
P - 2.3e-07 

>PIR:S14970 extensin class I (clone wl7-l) - tomato 
Length 132 

HSPs: 

Score - 118 (17.7 bits), Expect - 2.3e-07, P = 2.3e-07 
Identities = 30/82 (36%), Positives - 35/82 (42%) 

Query: 87 PPPLIEEPAFNVSYTRQPPNPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPNSPQ 14 6 

PPP P Y + PPPP PPYYPP+P + PSP 

Sbjct: 32 PPPSPSPPP— PYYYKSPPPPSPSP— PPPYYYKSPPPPDPSPPPPYYYKSPPPPSPSPP 87 

Query: 147 GSVACPPPPAYCNTPPPP — YEQV 168 

PPPP Y + PPPP YE + 
Sbjct: 88 PPSPSPPPPTYSSPPPPPPFYENI 111 

Score = 104 (15.6 bits), Expect = 6.9e-06, P = 6.9e-06 
Identities = 28/78 (35%), Positives « 34/78 (43%) 

Query: 87 PPPLIEEPAFNVSYTRQPPNPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPNSPQ 146 

PP P + Y + PP P P P P YY P P +P ++ PP P 

Sbjct: 1 PPSPSPPPPY YYKSPPPPSPSP--PPPYYYKSPPPPSPSP PPPYYYKSPP-PPS 51 

Query: 147 GSVACPPPPAYCNTPPPP 164 

S PPPP Y +PPPP 
Sbjct: 52 PS PPPPYYYKSPPPP 66 

Score « 102 (15.3 bits), Expect « l.le-05, P = l.le-05 
Identities = 30/78 (38%), Positives = 33/78 (42%) 

Query: 87 PPPLIEEPAFNVSYTRQPPNPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPNSPQ 14 6 

PPP P Y+PPPP PPYYPP+P S+ PPP 
Sbjct: 48 PPPSPSPPP—PYYYKSPPPPDPSP— PPPYYYKSPPPPSPSPPPPSPS PP-PPT 97 
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Query: 147 GSVACPPPPAYCNTPPPP 164 

S PPPP Y N P PP 
Sbjct: 98 YSSPPPPPPFYENIPLPP 115 

Score = 95 (14.3 bits), Expect - 2.4e-04, P = 2.4e-04 
Identities = 24/61 (39%), Positives - 29/61 (47%) 

Query: 104 PPNPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPNSPQGSVACPPPPAYCNTPPP 163 

PP+P P P P YY P P +P ++ PP P S PPPP Y + PPP 

Sbjct: 1 PPSPSP PPPYYYKSPPPPSPSP— PPPYYYKSPP-PPSPS— PPPPYYYKSPPP 49 

Query: 164 P 164 
P 

Sbjct: 50 P 50 



Score = 68 (10.2 bits), Expect = 4.2e+0O, P = 9.8e-01 
Identities = 24/69 (34%), Positives - 29/69 (42%) 

Query: 87 PPPLIEEPAFNVSYTRQPP— NPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPN 143 

PPP P Y PP +P P + P PP Y+ P P P + + PP 
Sbjct: 63 PPPPDPSPPPPYYYKSPPPPSPSPPPPSPSPPPPTYSSPPPPP--PFYENIPL PPV 116 

Query: 144 SPQGSVACPPPP 155 

S A PPPP 
Sbjct: 117 IGV-SYASPPPP 127 



Peptide information for frame 3 



ORF from 0 bp to 368 bp; peptide length: 123 
Category: questionable ORF 
Classification: unset 



1 GSHEAPACEG GGAAARAALG VHRSQKALLV FRRTLSNLLY MPLLRGLLWL 
51 QVLCAGPLHT EAVVLLVPSD DGRAFLLRSR LLHPEAHVPP AADRGASLQC 
101 VLHQAAPKSR PRSPAAGAAL LH 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_22k8, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_22k8 , frame 1 



Report for DKFZphfbr2_22k8 . 1 



[LENGTH] 172 

[MW] 19194.47 

[pi] 8.77 

[KW] SIGNAL_PEPTIDE 23 

(KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 27.33 % 



SEQ MRRQPAKVAALLLGLLLECTEAKKHCWYFEGLYPTYYICRSYEDCCGSRCCVRALSIQRL 

SEG xxxxxxx 

PRO ccchhhhhhhhhhhhhhhhhhhhhhcccccccccceeeeccccccccccchhhhhhhhhh 

MEM 

SEQ WYFWFLLMMGVLFCCGAGFFIRRRMYPPPLIEEPAFNVSYTRQPPNPGPGAQQPGPPYYT 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhccccceeeeecccccccccccccceeeeccccccccccccccccccc 

MEM . . . . MMMMMMMMMMMMMMMMM 

SEQ DPGGPGMNPVGNSTAMAFQVPPNSPQGSVACPPPPAYCNTPPPPYEQVVKAK 

SEG xxxxxx xxxxxxxxxxxxxxxx 

PRD ccccccccccccccceeecccccccccccccccccccccccccccccccccc 

MEM 



(No Prosite data available for DKFZphfbr2_22k8 . 1) 
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(No Pfam data available for DKFZphfbr2_22k8 . 1) 



Pedant information for DKFZphfbr2_22k8, frame 3 



Report for DKFZphfbr2_22k8 . 3 

[LENGTH] 122 
[MW] 12854.08 
[pi] 10.27 
[KW] All_Alpha 

[KW] LOW_COMPLEXITY 25.41 % 

SEQ GSHEAPACEGGGAAARAALGVHRSQKALLVFRRTLSNLLYMPLLRGLLWLQVLCAGPLHT 

SEG . . . .xxxxxxxxxxxxxxxx 

PRD ccccccccccchhhhhhhhccccchhhhhhhhhhhhhhhccccccchhhhhhhhcccccc 

SEQ EAWLLVPSDDGRAFLLRSRLLHPEAHVPPAADRGASLQCVLHQAAPKSRPRSPAAGAAL 

SEG xxxxxxxxxxxxxxx. 

PRD cceeeeeccccchhhhhhhhccccccccccccccchhhhhhhhhccccccccchhhhhhc 

SEQ LH 
SEG 

PRD cc 

(No Prosite data available for DKFZphfbr2_22k8 . 3) 
(No Pfam data available for DKFZphfbr2_22k8 .3) 
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DKFZphfbr2_23bl0 



group: nucleic acid managment 

DKFZphfbr2_2blO encodes a novel 580 amino acid protein with strong similarity to rat RNA 
helicase HEL117. 

HEL117 is a DEAD/H box helicase, which co-localises with a splicing factor and thus seems to 
be involved in splicing. 

The new protein can find application in modulation of splicing. 

strong similarity to rat RNA helicase HEL117 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2905 bp 

Poly A stretch at pos. 2885, no polyadenylation signal found 

1 GGGGGCTCCG CTCCGCACCA CCAACCCCGG GCCGCAGTCC TGACGAGCGG 

51 GTCAGGGCTT GTCGGGCGGA AGCCTGGCCT GGAGCCTGGA AGGGGGAGAC 

101 GGCCCGAGCG GGAGCGGGAG CGGACGCGGC CTCAGTCCTG CGCGGAATAT 

151 TGAAGGATGT TTGTTCCAAG ATCTCTAAAA ATCAAGAGGA ATGCTAATGA 

201 TGATGGCAAA AGTTGTGTGG CTAAGATAAT TAAACCAGAC CCAGAAGACC 

251 TTCAGTTGGA CAAAAGCAGA GATGTTCCCG TTGATGCTGT AGCTACAGAA 

301 GCAGCCACAA TAGACAGGCA CATCAGCGAA TCATGCCCTT TCCCCAGCCC 

351 AGGTGGCCAG TTGGCAGAGG TTCATTCAGT AAGTCCCGAG CAGGGTGCGA 

401 AGGACAGCCA TCCTTCTGAA GAGCCCGTTA AGTCATTTTC CAAAACACAG 

451 CGCTGGGCAG AACCAGGGGA ACCCATCTGT GTTGTCTGTG GTCGTTATGG 

501 AGAGTATATC TGTGATAAGA CAGATGAAGA TGTGTGTAGT TTGGAGTGTA 

551 AAGCGAAACA TCTTCTACAA GTTAAGGAAA AGGAAGAGAA ATCAAAACTC 

601 AGCAATCCAC AGAAGGCTGA TTCTGAGCCA GAGTCTCCAC TGAATGCTTC 

651 CTATGTCTAC AAAGAGCACC CCTTTATTTT GAACCTTCAG GAAGACCAGA 

701 TTGAAAATCT TAAACAGCAG CTGGGAATTT TAGTTCAAGG GCAAGAAGTC 

751 ACCAGGCCCA TTATTGACTT TGAACATTGT AGTCTCCCTG AGGTCTTAAA 

801 TCACAACTTG AAGAAATCAG GCTATGAGGT GCCAACTCCC ATTCAAATGC 

851 AGATGATTCC TGTGGGACTT CTGGGAAGAG ACATTCTGGC CAGTGCAGAT 

901 ACTGGCTCAG GAAAAACAGC TGCTTTTCTT CTTCCTGTTA TCATGCGAGC 

951 TTTATTCGAG AGCAAAACTC CATCTGCGCT CATTCTTACA CCAACCAGAG 

1001 AGTTAGCCAT TCAGATAGAG AGACAAGCTA AAGAATTGAT GAGTGGCCTG 

1051 CCACGCATGA AAACTGTGCT TCTTGTAGGG GGCTTACCCT TACCCCCACA 

1101 GCTTTATCGT CTGCAACAAC ATGTTAAGGT TATCATAGCA ACCCCTGGGC 

1151 GACTTCTGGA TATAATAAAG CAGAGCTCTG TAGAACTCTG TGGTGTAAAG 

1201 ATTGTGGTAG TAGATGAAGC TGATACCATG TTAAAGATGG GTTTTCAACA 

1251 ACAAGTGCTT GACATTTTGG AAAACATTCC TAATGATTGT CAGACCATTT 

1301 TGGTTTCAGC CACAATTCCA ACTAGCATAG AACAGCTAGC AAGCCAGCTT 

1351 CTGCATAATC CTGTGAGAAT TATCACTGGA GAAAAGAACC TACCTTGTGC 

1401 CAATGTACGT CAGATTATTT TGTGGGTAGA AGACCCAGCC AAAAAGAAAA 

1451 AATTATTTGA AATTTTAAAT GATAAGAAAC TCTTTAAGCC TCCAGTGTTA 

1501 GTATTTGTGG ACTGCAAACT AGGAGCAGAT CTTTTGAGTG AAGCCGTTCA 

1551 GAAAATCACA GGGCTGAAAA GCATATCTAT ACATTCGGAG AAGTCGCAAA 

1601 TAGAAAGGAA AAACATATTG AAGGGATTAC TTGAAGGAGA CTATGAAGTT 

1651 GTAGTGAGCA CAGGAGTCTT GGGACGAGGC CTAGACTTGA TCAGTGTCAG 

1701 GCTGGTTGTC AATTTTGATA TGCCTTCAAG TATGGATGAG TATGTCCATC 

1751 AGGAAAATAC CTACAAGTCT ACTTGGAGGA ATCCCCAGCA TTTTCAACAG 

1801 GATGTCAGAA TGACCTTGGG CTATGTTGGC AAAGCACAAT GGGAAGAAGA 

1851 CAACCAATTG AAGGTCAAAC TAGGCCTTAA AAAAAATTGT TCTTCCTAAA 

1901 TGAAACTTTA TGTAAGACCC AAGCTTCCTT TATGTAAAAA TAGGATACTC 

1951 ACTAGGCTTT GGGGCTGACA ATGGTTTTTA AATCTTGCTA ATCTTCCCTG 

2001 GAATGAAACC AGCATGACTC AAAGAGAAAA AGAGAGTCTA TAATATTTTC 

2051 TAATCCCTGA GTTCTTTTCT TTATATATTA AAAAGGATTA TTAGGCTGGG 

2101 TGTGGTGGCT CACGCCTGTA ATCCCAGCAC TTTGGGAGGC CGAGGGGAGT 

2151 GGATCACCTG AGTTCGAGAC CAGCCTAACC AACATGGAGA AACCCTGTCT 

2201 CTACTAAAAA TACAAAATTA GCCAGGCGTG GTGGCGCATG CCTGTAATCC 

2251 CAGCTACTCA GGAGGCTACA GCAGGAGAAT TGCTTGAACT CGGGAGGCAG 

2301 AGCCAAGATC GCACCACTGC ACTCCAGCCT GGGCAACAAG AGTGAAACTC 

2351 TGTCTCAAAA TAATATTAAT GATAATAATA ATAATAATAA TAGGGATTAC 

2401 TTGCATAATT GTTCTTTTAA AATTATTGGC AGTATTGCTG AATGTATTTA 

2451 GATTTTTTCA CCAAGTGACA ACAACTGAAT TCATAAAGAT TCATCAACAA 

2501 GACCTGATAA AAAAAAATGT AAGCATATTA TAGTGGATAC TTCCAAGACT 

2551 CTTGGTCTAA CATGTATTAG AAAGCAGAAG GAGCCCAGGC ACAGGGGCTC 

2601 CCGCCGGTAA TCCCAAAGCT TTGGGAAGCC AAGGCAGGTG GATCGCTTGA 

2651 GCTCAGGAGT TAGAGACCAG CCTGGGCAAC ATGGTGAAAT CCCGTCACCA 
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2701 CAAAAAAATG CAAAAATTAA CTGGGCGTGG TGGCATGCAC CTGTAGTCCC 
2751 AGCTACTCTG GAGGCTGAGG TGAGGGGAAT CACCTGAGCC GGGGGAATCA 
2B01 CCTGAGCCCA GGGAAGTTGA GGCTGCTGTG AGCCATGGTC ATGACACTGC 
2851 CCTCCAGCCT GGACAACAGA TTGAGACCCT GTCTCAAAAA AAAAAAAAAA 
2901 AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



Medline: 

A putative mammalian RNA helicase with an arginine-serine-rich 
domain 



Peptide information for frame 1 



ORF from 157 bp to 1896 bp; peptide length: 580 
Category: strong similarity to known protein 
Prosite motifs: ATP_GTP_A (247-255) 
LEUCINE_ZIPPER (298-320) 



1 MFVPRSLKIK RNANDDGKSC VAKIIKPDPE DLQLDKSRDV PVDAVATEAA 

51 TIDRHISESC PFPSPGGQLA EVHSVSPEQG AKDSHPSEEP VKSFSKTQRW 

101 AEPGEPICVV CGRYGEYICD KTDEDVCSLE CKAKHLLQVK EKEEKSKLSN 

151 PQKADSEPES PLNASYVYKE HPFILNLQED QIENLKQQLG ILVQGQEVTR 

201 PIIDFEHCSL PEVLNHNLKK SGYEVPTPIQ MQMIPVGLLG RDILASADTG 

251 SGKTAAFLLP VIMRALFESK TPSALILTPT RELAIQIERQ AKELMSGLPR 

301 MKTVLLVGGL PLPPQLYRLQ QHVKVIIATP GRLLDIIKQS SVELCGVKIV 

351 VVDEADTMLK MGFQQQVLDI LENIPNDCQT ILVSATIPTS IEQLASQLLH 

401 NPVRIITGEK NLPCANVRQI ILWVEDPAKK KKLFEILNDK KLFKPPVLVF 

451 VDCKLGADLL SEAVQKITGL KSISIHSEKS QIERKNILKG LLEGDYEVVV 

501 STGVLGRGLD LISVRLWNF DMPSSMDEYV HQENTYKSTW RNPQHFQQDV 

551 RMTLGYVGKA QWEEDNQLKV KLGLKKNCSS 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphfbr2_23blO, frame 1 

PIR:A57514 RNA helicase HEL117 - rat, N = 2, Score = 615, P » 1.6e-60 

TREMBL:AB01834 4_1 gene: "KIAA0801 M ; product: "KIAA0801 protein"; Homo 
sapiens mRNA for KIAA0801 protein, complete cds., N - 1, Score = 615, P 
= 2.8e-59 



TREMBL:CEF01F1_1 gene: "F01F1.7"; Caenorhabditis elegans cosmid 
F01F1., N = 2, Score = 365, P = 1.9e-58 

TREMBL:AF083255_1 product: "RNA helicase-related protein"; Homo 
sapiens RNA helicase-related protein mRNA, complete cds., N = 2, Score 
= 556, P = 1.5e-57 

PIR:S14048 RNA helicase dbp2 - fission yeast (Schizosaccharomyces 
pombe), N *» 1, Score = 591, P « 1.6e-57 



>PIR:A57514 RNA helicase HEL117 - rat 
Length = 1,032 

HSPs: 

Score = 615 (92.3 bits). Expect - 1.6e-60, Sum P(2) = 1.6e-60 
Identities « 140/394 (35%), Positives « 236/394 (59%) 

Query: 14 4 EKSKLSNPQKADSEPESPLNASYVYKEHPFILNLQEDQIENLKQQL-GILVQGQEVTRPI 202 

++ KL P P ++ Y E P + + ++++ + ++ GI V+G+ +PI 

Sbjct: 313 KQRKLLEPVDHGKIEYEPFRKNF-YVEVPELAKMSQEEVNVFRLEMEGITVKGKGCPKPI 371 

Query: 203 IDFEHCSLPEVLNHNLKKSGYEVPTPIQMQMIPVGLLGRDILASADTGSGKTAAFLLPV- 261 
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+ C + + ++LKK GYE PTPIQ Q IP + GRD++ A TGSGKT AFLLP+ 



Sbjct : 


J 1 £. 




431 


Query: 


262 


--IM--RALFESKTPSALILTPTRELAIQIERQAKELMSGLPRMKTVLLVGGLPLPPQLY 


317 






IM R+L E + P A+I+TPTRELA+QI ++ K+ L ++ V + GG + Q+ 




Sbjct : 




RUTMTViRQT FFfiFfiPTaVTMTPTRFLALOTTKFCKKFSKTLG-LRvVCVYGGTGT SFOT A 


490 


Query: 


318 


RLQQHVKVIIATPGRLLDIIKQSS VELCGVKIVVVDEADTMLKMGFQQQVLDILENI 


374 






L++ ++I+ TPGR++D++ +S L V VV+DEAD M MGF+ QV+ I++N+ 




Sbjct : 




FT.KRfiAFT TVfTPfiRMT nMTiAAN^fiRVTNLRRVTYVVLDEADRMFDMGFEPOVMRI VDNV 


550 


Query: 


375 


PNDCQTILVSATIPTSIEQLASQLLHNPVRIITGEKNLPCANVRQIILWVEDPAKKKKLF 


434 






D QT++ SAT P ++E LA ++L P+ + G +++ C++V Q ++ +E+ K KL 




Sbjct: 


551 


RPDRQTVMFSATFPRAMEALARRILSKPIEVQVGGRSWCSDVEQQVIVIEEEKKFLKLL 


610 


Query: 


435 


EILNDKKLFKPPVLVFVDCKLGADLLSEAVQKITGLKSISIHSEKSQIERKNILKGLLEG 


494 






E+L + V++FVD + AD L + + + + +S+H Q +R +1+ G 




Sbjct: 


611 


ELLGHYQE-SGSVIIFVDKQEHADGLLKDLMRAS-YPCMSLHGGIDQYDRDSIINDFKNG 


668 


Query: 


495 


DYEVWSTGVLGRGLDLISVRLVVNFDMPSSMDEYVHQ 532 








+++V+T V RGLD+ + LVVN+ P+ ++YVH+ 




Sbjct: 


669 


TCKLLVATSVAARGLDVKHLILVVNYSCPNHYEDYVHR 706 




Score 


= 37 


(5.6 bits), Expect = 1.6e-60, Sum P(2) » 1.6e-60 





Identities = 13/36 (36%), Positives - 17/36 (47%) 

Query: 132 KAKHLLQVKEKEE KSKLSNPQKADSEPESPLNA 164 

KA++ + KEK E SK K D E E +A 

Sbjct: 113 KAENRSRSKEKAEGGDSSKEKKKDKDDKEDEKEKDA 148 



Pedant information for DKFZphfbr2_23blO, frame 1 



Report for DKFZphfbr2_23blO. 1 



[LENGTH] 


580 






[MW] 


64572.24 






[pi] 


6.13 






[ HOMOL J 


TREMBL : CEF01 Fl_l gene: "F01F1.7"; Caenorhabditis elegans cosmid F01F1. 8e-61 


[FUNCAT] 


30.10 nuclear organization [S. cerevisiae, YNLll2w] 2e-53 






[FUN CAT J 


04.01.04 rrna processing [S. cerevisiae, YNL112w] 2e-53 






[FUNCAT] 


04.05.03 mrna processing (splicing) [S. cerevisiae, YPLll9c] 


5e 


-53 


[FUNCAT] 


30.03 organization of cytoplasm [S. cerevisiae, YOR204w] 


2e 


-49 


[FUNCAT] 


05.04 translation (initiation, elongation and termination) [S 


. cerevisiae, 


YOR204w] 2e 


-49 






[FUNCAT] 


j mrna translation and ribosome biogenesis [H. influenzae, HI0231 RNA] 2e- 


[FUNCAT] 


06.10 assembly of protein complexes [S. cerevisiae, YLL008w] 


3e 


-43 


(FUNCAT] 


04.99 other transcription activities [S. cerevisiae, YDL160c] 


4e 


-39 


[FUNCAT] 


1 genome replication, transcription, recombination and repair 




[H. 


influenzae, 


HI0892] 3e-35 






[FUNCAT] 


04.05.01.07 chromatin modification [S. cerevisiae, YMR290c] 


6e 


-34 


[FUNCAT] 


98 classification not yet clear-cut [S. cerevisiae, YOR046c] 


3e 


-32 


[ FUNCAT ] 


09.01 biogenesis of cell wall [S. cerevisiae, YJL033w] 


6e 


-30 


[FUNCAT] 


30.16 mitochondrial organization [S. cerevisiae, YDR194C] 


5e 


-23 


[FUNCAT] 


99 unclassified proteins [S. cerevisiae, YGL064C] le-16 






[FUNCAT] 


r general function prediction [M. jannaschii, MJ1401] Se- 


11 


[FUNCAT] 


ll. 10 cell death [S. cerevisiae, YMR190c] le-06 






[FUNCAT] 


03.19 recombination and dna repair [S. cerevisiae, YMR190c) 


le 


-06 


[BLOCKS] 


BL00115B Eukaryotic RNA polymerase II heptapeptide repeat proteins 


[BLOCKS] 


BL00039D DEAD- box subfamily ATP-dependent helicases proteins 






[BLOCKS] 


BL00039C DEAD-box subfamily ATP-dependent helicases proteins 






[BLOCKS] 


BL00039B DEAD-box subfamily ATP-dependent helicases proteins 






[BLOCKS] 


BL00039A DEAD-box subfamily ATP-dependent helicases proteins 






[PIRKW] 


nucleus 6e-53 






[PIRKW] 


RNA binding 9e-52 






[PIRKW] 


DEAD box 2e-43 






[PIRKW] 


transmembrane protein le-21 






[PIRKW] 


DNA binding 5e-48 






[PIRKW] 


ATP 4e-57 






[PIRKW] 


purine nucleotide binding 2e-43 






[PIRKW] 


P-loop 4e-57 






[PIRKW] 


hydrolase 6e-42 






[PIRKW] 


protein biosynthesis 2e-43 






[PIRKW] 


ATP binding 2e-50 






[SUPFAM] 


WW repeat homology le-49 






[SUPFAM] 


translation initiation factor eIF-4A 2e-43 






[SUPFAM] 


DEAD/H box helicase homology 4e-57 






[SUPFAM] 


recQ helicase homology 8e-06 
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unassigned DEAD/H box helicases 


4e- 


.c. 7 




ATP-dependent RNA helicase DBP1 


2e- 


D J 




ATP-dependent RNA helicase DHH1 


6e- 


-40 




tobacco ATP-dependent RNA helicase 


DB10 le-49 




Bloom's syndrome helicase 8e-06 






[PROSITE] 


ATP GTP A 1 






[PROSITE] 


LEUCINE ZIPPER 1 






[ PROSITE) 


MYRISTYL 6 






[PROSITEJ 


CK2 PHOSPHO SITE 8 






[ PROSITE) 


TYR PHOSPHO SITE 1 






[PROSITEJ 


PKC~PHOSPHO SITE 7 






[PROSITE] 


ASN GLYCOSYLATION 1 






[PFAM] 


Helicases conserved C-terminal domain 


[PFAMJ 


DEAD and DEAH box helicases 






[KWJ 


Alpha Beta 






[KW] 


LOW_COMPLEXITY 3.10 % 







SEQ MFVPRSLKIKRNANDDGKSCVAKIIKPDPEDLQLDKSRDVPVDAVATEAATIDRHISESC 

SEG 

PRD ccccceeeeccccccccceeeeeeeeccccceeecccccccccchhhhhhhhhhhhcccc 

SEQ PFPSPGGQLAEVHSVSPEQGAKDSHPSEEPVKSFSKTQRWAEPGEPICVVCGRYGEYICD 

SEG 

PRD cccccccceeeeccccccccccccccccccccccccccccccccccceeeeccccceeec 

SEQ KTDEDVCSLECKAKHLLQVKEKEEKSKLSNPQKADSEPESPLNASYVYKEHPFILNLQED 

SEG 

PRD cccccccchhhhhhhhhhhhhhccccccccccccccccccccccceeeccccccccchhh 

SEQ QIENLKQQLGILVQGQEVTRPIIDFEHCSLPEVLNHNLKKSGYEVPTPIQMQMIPVGLLG 

SEG 

PRD hhhhhhhhheeeeccccccccccccccccchhhhhhhhhhhccccccccccccceeeecc 

SEQ RDILASADTGSGKTAAFLLPVIMRALFESKTPSALILTPTRELAIQIERQAKELMSGLPR 

SEG T*. 

PRD cceeeeeccccccceeeehhhhhhhhcccccceeeeecchhhhhhhhhhhhhhhhccccc 

SEQ MKTVLLVGGLPLPPQLYRLQQHVKVIIATPGRLLDIIKQSSVELCGVKIVVVDEADTMLK 

SEG . . . xxxxxxxxxxxxxxxxxx 

PRD eeeeeeecccccchhhhhhhhheeeeeeccccchhhhhhheeeeeeeeeeeehhhhhhhh 

SEQ MGFQQQVLDILENIPNDCQTILVSATIPTSIEQLASQLLHNPVRIITGEKNLPCANVRQI 

SEG 

PRD cccchhhhhhhhhcccccceeeeecccchhhhhhhhhhhhceeeeeeeccccccccccce 

SEQ ILWVEDPAKKKKLFEILNDKKLFKPPVLVFVDCKLGADLLSEAVQKITGLKSISIHSEKS 

SEG 

PRD eeecccchhhhhhhhhhhhhccccceeeeeeecccchhhhhhhhhhhhccceeeccccch 

SEQ QIERKNILKGLLEGDYEVWSTGVLGRGLDLISVRLVVNFDMPSSMDEYVHQENTYKSTW 

SEG 

PRD hhhhhhhhhhhccccceeeeehhhhhhcccceeeeeeeeecccccccceeeecccccccc 

SEQ RNPQHFQQDVRMTLGYVGKAQWEEDNQLKVKLGLKKNCSS 

SEG 

PRD ccccccchhhhhhhccccchhhhhhhhhhhhhhhcccccc 



Prosite for DKFZphfbr2_23bl0 . 1 



PS00001 


163- 


->167 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 




6->9 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


97 


->100 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


251- 


->254 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


477- 


->480 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


513- 


->516 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


535- 


->538 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


539 


->542 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


122 


->126 


CK2~PHOSPHO" 


"site 


PDOC00006 


PS00006 


156- 


->160 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


209- 


->213 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


221- 


->225 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


340 


->344 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


389- 


->393 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


480- 


->484 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


524- 


->528 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


489- 


->497 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


66->72 


MYRISTYL 


PDOC00008 


PS00008 


80->86 


MYRISTYL 




PDOC00008 
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PS00008 


195- 


->201 


MYRISTYL 


PDOC00008 


PS00008 


250- 


->256 


MYRISTYL 


PDOC00008 


PS00008 


490- 


->496 


MYRISTYL 


PDOC00008 


PS00008 


573- 


->579 


MYRISTYL 


PDOC00008 


PS00017 


247- 


->255 


ATP GTP_A 


PDOC00017 


PS00029 


298- 


->320 


LEUCINE ZIPPER 


PDOC00029 



Pfam for DKFZphfbr2_23blO . 1 



HMMJJAME DEAD and DEAH box helicases 

HMM *gLpPWILRnIyeMGFEkPTPIQQqAIPiILeGRDVMACAQTGSGKTAAF 
+LP+ + N+++ G+E PTPIQ+Q IP+ L GRD++A A TGSGKTAAF 
Query 209 SLPEVLNHNLKKSGYEVPTPIQMQMIPVGLLGRDILASADTGSGKTAAF 257 

HMM UPMLQHIDwdPWpqpPQdPrALILAPTRELAMQIQEEcRkFgkHMnglR 
L+P++ + + + ++P ALIL+PTRELA+QI+++++++ + ++ ++ 
Query 258 LLPVIMRALFES--KTPS ALILTPTRELAIQIERQAKELMSGLPRMK 302 

HMM ImcIYGGtnMRdQMRmLeRGpPHIVIATPGRLIDHIERgtldLDrleMLV 
++++GG+++ +Q+ +L++ + ++IATPGRL+D+I++ ++ L ++++V 
Query 303 TVLLVGGLPLPPQLYRLQQHV-KVIIATPGRLLDIIKQSSVELCGVKIW 351 

HMM MDEADRMLDMGFI DQI Rr IMrql PMpwNRQTMMFS ATMPde I qELARr FM 

DEAD ML MGF++Q+ +1+ IP + QT++ SAT+P +I++LA ++ 
Query 352 V DE ADTMLKMG FQQQ VLD I LENIP — NDCQTILVSATIPTSI EQLAS QLL 399 

HMM RNPIRInldMdElTtnEnlkQwYiyVerEMWKfdcLcrLIe* 

+NP+RI+ ++++L N++Q++ +VE + K +L+++++ 
Query 400 HNPVRIITGEKNLPCA-NVRQIILWVE-DPAKKKKLFEILN 438 



HMM_NAME 

HMM 

Query 

HMM 

Query 



Helicases conserved C-terminal domain 



458 



♦EileeWLknl.GIrvmYIHGdMpQeERdelMddFNnGEynVLIcTDVgg 
++L+E ++ G++ ++IH+ ++Q ER + + +G+Y V ++T V+.G 
DLLSEAVQKITGLKSISIHSEKSQIERKNILKGLLEGDYEVWSTGVLG 



506 



RGIDIPdVNHVINYDMPWNPEqYIQRIGRTgRIG* 
RG+D+++V++V+N+DMP +++ Y++ + T + 
507 RGLDL I S VRLWN FDM PS SMDE YVH - QENT Y K ST 539 
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DKFZphfbr2_23b21 



group: signal transduction 

DKFZphfbr2_23b21.1 encodes a novel 193 amino acid protein which is nearly identical to bovine 
neurocalcin. 

Neurocalcin is a Ca (2+) -binding protein with three putative Ca (2+) -binding domains (EF-hands) . 
In cattle, 6 isoforms are differentially expressed in the central nervous system, retina and 
adrenal gland. Homology with recoverin indicates involvement in Ca2+ dependent activation of 
guanylate cyclase. 

The new protein can find application in modulating/blocking the guanylate cyclase-pathway. 



nearly identical to bovine neurocalcin 

complete cds complete cDNA 
EST hits 

Sequenced by AGOWA 

Locus: /map="574.6 cR from top of Chr8 linkage group" 
Insert length: 3300 bp 

Poly A stretch at pos. 3279, polyadenylation signal at pos. 3249 



1 GGGGAGAATC TGGTGGATGC TGGACCTTGC TGCTGCTGCT ACTGCTGTTT 

51 CCAGGGGCTG CAGAGCATGG ACTGTTAAAT CTTGCACTTC TTCTGAGTGA 

101 GCTGAATTCT TGCCGCCAGG ATGGGGAAAC AGAACAGCAA GCTGCGCCCG 

151 GAGGTCATGC AGGACTTGCT GGAAAGCACA GACTTTACAG AGCATGAGAT 

201 CCAGGAATGG TATAAAGGCT TCTTGAGAGA CTGCCCCAGT GGACATTTGT 

251 CAATGGAAGA GTTTAAGAAA ATATATGGGA ACTTTTTCCC TTATGGGGAT 

301 GCTTCCAAAT TTGCAGAGCA TGTCTTCCGC ACCTTCGATG CAAATGGAGA 

351 TGGGACAATA GACTTTAGAG AATTCATCAT CGCCTTGAGT GTAACTTCGA 

401 GGGGGAAGCT GGAGCAGAAG CTGAAATGGG CCTTCAGCAT GTACGACCTG 

451 GACGGAAATG GCTATATCAG CAAGGCAGAG ATGCTAGTGA TCGTGCAGGC 

501 AATCTATAAG ATGGTTTCCT CTGTAATGAA AATGCCTGAA GATGAGTCAA 

551 CCCCAGAGAA AAGAACAGAA AAGATCTTCC GCCAGATGGA CACCAATAGA 

601 GACGGAAAAC TCTCCCTGGA AGAGTTCATC CGAGGAGCCA AAAGCGACCC 

651 GTCCATTGTG CGCCTCCTGC AGTGCGACCC GAGCAGTGCC GGCCAGTTCT 

701 GAGCCCTGCG CCCACCAATC GAATTGTAGA GCTGCTTGTG TTCCCTTTTG 

751 ATTCTTCTTT TTAACAATTT TTTTTTTTTT TTGCCAAACA ATATCAATGG 

801 TGATGCCGTC CCCTGTGCGG TCTGATGCGC CTTCCTCCGT GACGCCTTCA 

851 GCCTCTTTTG TGGTGGATGC TTCGTGGGAA TGCCCAGAGC CCCAGTGTGC 

901 TTGTGGAGAG CATGGACAGA CTTCGTGGTG TTCATTGTTT GATGATTTTT 

951 AATCGTTACT ATTATTTCTT TTTATTCTAA TGTCTCTGTT CTAAAACGTA 

1001 AGACTCGGGG GTTGGGGCAA AAGAAGGGAA ACCCATCCAG TCCTGTGATT 

1051 CTATTGCAAG CTTCAAGGGG CTTTTGTTTG AAAGACAAAA CTCCCCACCT 

1101 GGGTCTGTTG TCACACGTGC CGTAGGGGTG ATGGATGGCA CCGGATGCTG 

1151 GATTCCCCAA GAACAAGTTA CCCTCTGGGG TGAGGCTATT CCAGCGAGCT 

1201 GGGACATTTC CCCATGGGGG CCCACTCCCC TCTCTTCCCC AGCAGGCTGT 

1251 AGTTTCTAAG CTGTGAACAT TTCAAGATAA ATTAACAGAG GAGAGGAAAA 

1301 AGATGGCTCA GCTATTTTTT CACAGGTTTA CACTAGTTGA GCTAATATGC 

1351 GTGTCTTTGG AAATTAAACA CAAATGGTAA CATATTCCAA AACCAGACCC 

1401 ATCTTGTTGC CTATTGTGAT AAAATAAAAA GACGGCTGTA TATAACATAT 

1451 TGGGTAATGC AGACCAAATT AAGTGTTTTG CCTTGTTTAA ATGAAATGCA 

1501 TGTTTAGTGA GCACTAATAC AATCTTATTC CAGAAGACTG TTTTTAGTAG 

1551 CTTATTGTGA AGTAAGACAA CTATAATGAA TGTCTGTCTT GTTTGGAAGT 

1601 CATATCTGTC TTTGCACAAA TGTACCAATC GACAAGTATA TTTTATATAT 

1651 TCCATAAAAA TACAAAGTAA CCCTGACTAG GGCCCAACTT TAATTTTGAA 

1701 TGCATTTCCA GAGTGGCCAT GCCTAGAGGG CAGATGCAGA GCAGGTGGTA 

1751 GTGGGACAGG ACAATTGGAG CACAGGAATG TTAACATGTA TGACAGGGGA 

1801 CCAGTAGGGT GGTTTCCCTC TCAGGCCCAG CAGCCCATTG ACAGCATTAG 

1851 ACTGGCGGCA TGGTGCTTTT CTGAGCAGAT CAATACTCTG CAGACTCGAA 

1901 AAAACATCAC ATACATTCTT GGAACTTCCC AGTGGTTTAA TCTATGTGCA 

1951 TGGTTAGGGA GCCAGGCCTG GAATATTCAG TTTCCCTGCC CCTGTTAAAG 

2001 AATCAGAGGT TGGGCAGTCA TCAAATTCAT CATAAAGACA TGGGCAAGTG 

2051 TGTCTGTGGT TTCCAAGGCC CCCCTATGGA GAATCCAAAA GTATTTTCCA 

2101 TTGCCGTGCT CTTTGAATGC AGACTTCTAT TTCCAGAAGT GACAGCACAA 

2151 GTCTGAGTTG CTGTTTGGTC TGGTGACCTC AGACACACTA ATTTGAATTG 

2201 AAAGCTAAGA GTAAAAATTT GCTGGTTACA GGCGAGTCAT ACTCTTGCAA 

2251 GTAGTTAGCA AAGGGAGGCC CAAATTCTCA AGGTTGTTGA TGGGGAACTT 

2301 GCCACTAAGA GAAGGCAGAG AGGTCCCTAG TGGGTATATT TGCTGCCAAG 

2351 CCACTTGCCA AAGAAGAGGA ACCACAGAAA GAGAGACATC ATGACCAGGA 

2401 GAAAAATGTG ACTAGACATG CTAACCTCCA GGTTTTTATA TATGACTTGA 

2451 GTCTGCTGTA ATTGGCAGCA GAAATCCAAA TTTGTATGGT AG AC C AAA AA 

2501 GAACCAAATC CATAGGGTGA AATTTTGAGA CCTAGACTCT GTAAAAATAA 
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2551 TCCTAGTCTT CCTCCAGGGG TCAGTTCCTC ACAGTGGTTC TGTACCAAAA 
2601 CTTGCCAAAT TCCTCCATGG CCAAGTGTTA AAATCTGTGT TTGGAAAATA 
2651 GCGAATTAAC CTAAGACACA GAAGGCAGAC TGGGTGAGGA GACCTAGCAT 
2701 GCCCTATTGG CAGTGCTCAG GAGCTGCATC CCACTTTTCC CTGCTCTGAA 
2751 TCGAAGTCCT AGTTCCTTCC TTTGATTCTC CTTTGGTAGG TGGAATCAGT 
2801 TAATGTTTTG AGAAACCTGC CTGGGCTCTG CCCTTAGTCA TGACATCTCG 
2851 CTGAGCCAGA CCCACTCTGT TCCTTGGAAC CTAGAGCTGG AGTGAGGAGT 
2901 AGAGGTCTCC GGCTATTCCA GAAAGAAAAG TGAGCCACAT GCAGGCTGAT 
2951 GAATGCCGAC ACTTCCAGAA TGTATAGAAA TAGTCCCTGT CCTGGCCTGC 
3001 CACTGACCCT GTCTGTATTT TCTCGGAGGT TGTTTTTCTC CTTCTCCTTC 
3051 CCAGGAAGGT CTTTGTATGT CGAATCCAGT GCACTCAAGT TTGGCCAAGG 
3101 GACTCCACAG CACCCAGAGG ACTGCATGCC TCAAGGTTTA TGTCACTCCT 
3151 CTGCTGGGCT GTTCATTGTC ATTGCTGTGT TCAGGGACCT TTGGAAATAA 
3201 AACCTGTTCT GTCCCAAATA AAACCAGCCT GTGATGTTCA AGGGACTGGA 
3251 ATAAAGTGGC TTACGACCTG AAGGATTCTA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



Entry HS431350 from database EMBL: 
human STS WI-15914. 
Score - 1308, P - 3.1e-53, identities - 276/285 

Entry HSG19929 from database EMBL: 
human STS A002C26. 
Score - 926, P - 1.5e-35, identities = 186/187 

Entry AF052142 from database EMBL: 

Homo sapiens Clone 24665 mRNA sequence. 

Score = 7378, P = 0.0e+00, identities - 1482/1487 

3' UTR 



Medline entries 



93247712: 

Neurocalcin family: a novel calcium-binding protein abundant in bovine 
central nervous 
system. 

94045365: 

Distinct regional localization of neurocalcin, a Ca (2+) -binding 
protein, in the bovine adrenal gland. 

96407688: 

Crystallization and preliminary X-ray crystallographic studies of 
recombinant bovine 

neurocalcin delta. 

96066284: 

Distribution pattern of three neural calcium-binding proteins (NCS-1, 
VILIP and recoverin) 

in chicken, bovine and rat retina. 



Peptide information for frame 1 



ORF from 121 bp to 699 bp; peptide length: 193 
Category: strong similarity to known protein 
Prosite motifs: EF HAND (73-86) 
£F_HAND (109-122) 
EF HAND (157-170) 



1 MGKQNSKLRP EVMQDLLEST DFTEHEIQEW YKGFLRDCPS GHLSMEEFKK 
51 IYGNFFPYGD ASKFAEHVFR TFDANGDGTI DFREFIIALS VTSRGKLEQK 
101 LKWAFSMYDL DGNGYISKAE MLVIVQAIYK MVSSVMKMPE DESTPEKRTE 
151 KIFRQMDTNR DGKLSLEEFI RGAKSDPSIV RLLQCDPSSA GQF 

BLASTP hits 

Entry JH0616 from database PIR: 
neurocalcin (clone pCalN) - bovine 
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Score * 1001, P = 5.2e~101, identities = 192/193, positives = 192/193 
Entry GGU91630_1 from database TREMBL: 

product: "neurocalcin"; Gallus gallus neurocalcin mRNA, complete cds. 
Score =» 998, P = l.le-100, identities = 191/193, positives = 192/193 

Entry NECD_BOVIN from database SWISSPROT: 
NEUROCALCIN DELTA. 

Score - 996, P = 1.8e-100, identities = 191/192, positives = 191/192 

Entry S47565 from database PIR: 
BDR-1 protein - human 

Score = 934, P - 6.6e-94, identities = 174/193, positives - 187/193 
Entry 150676 from database PIR: 

gene Rem-1 protein - chicken >TREMBL : GGREM1_1 gene: "Rem-1"; G. gallus 
rem-1 mRNA ~ 

Score - 933, P « 8.4e-94, identities = 174/193, positives = 186/193 



Alert BLASTP hits for DKFZphfbr2_23b21, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_23b21, frame 1 



Report for DKFEphfbr2_23b21 . 1 



[LENGTH] 193 

[MW] 22215.30 

[pi) 5.35 

[ HOMOL ] PIR:JH0616 neurocalcin {clone pCalN) - bovine le-109 

[ FUNC AT ] 98 classification not yet clear-cut [S. cerevisiae, YDR373w] 3e-54 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YKL190w] 2e-18 

' FUNC AT ]* 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YKLl90wJ 2e-18 

[FUNCAT] 03.01 cell growth [S. cerevisiae, YKLl90wJ 2e-18 

[FUNCAT] 13.04 homeostasis of other ions [S. cerevisiae, YKL190w] 2e-18 

[FUNCAT] 04.05.01.04 transcriptional control [S. cerevisiae, YKLl90w] 2e-18 

[FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YBR109c] 0.001 

[FUNCAT] 08.19 cellular import [S. cerevisiae, YBR109c] 0.001 

FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YBRl09c] 0.001 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YBR109c] 
1.001 

[FUNCAT] 10.02.99 other morphogenetic activities [S. cerevisiae, YBRl09c] 0.001 

; FUNCAT] 30.05 organization of centrosome [S. cerevisiae, YBRl09c] 0.001 

[BLOCKS] BL00018 

SCOP] dlrec 1.34.1.5.18 Recoverin [bovine (Bos taurus) 8e-55 

SCOP] dljsa 1.34.1.5.17 Recoverin [human (Homo sapiens) 5e-58 

SCOP] dltcob_ 1.34.1.5.16 Calcineurin regulatory subunit (B-chain le-06 

SCOP] d2mysc_ 1.34.1.5.15 Myosin Regulatory Chain [chicken (Gallu 2e-29 

SCOP] dlscmc_ 1.34.1.5.14 Myosin Regulatory Chain [bay scallo 5e-33 

SCOP] d2mysb_ 1.34.1.5.13 Myosin Essential Chain [chicken (Gallu 4e-26 

SCOP) dlscmb_ 1.34.1.5.12 Myosin Essential Chain [bay scallo 6e-27 

SCOP] dlclm 1.34.1.5.11 Calmodulin [Paramecium tetraurelia le-15 

SCOP] d4cln 1.34.1.5.10 Calmodulin [Drosophila melanogaster 2e-16 

SCOP] dlcfc 1.34.1.5.9 Calmodulin [African frog (Xenopus laevis) 2e-16 

SCOP] dlahr 1.34.1.5.8 Calmodulin [chicken gallus gallus 4e-16 

SCOP] d3cln 1.34.1.5.7 Calmodulin [rat (Rattus rattus) 2e-16 

SCOP] dltrcb_ 1.34.1.5.6 Calmodulin [bovine (Bos taurus) 8e-08 

SCOP] dlcll 1.34.1.5.5 Calmodulin (human (Homo sapiens) 2e-16 

SCOP] dlrtpl_ 1.34.1.4.5 Parvalbumin [rat (Rattus rattus) 8e-06 

SCOP] dStnc 1.34.1.5.2 Troponin C (turkey (Meleagris gallopavo) 3e-13 

SCOP) dlpvaa_ 1.34.1.4.3 Parvalbumin [pike (Esox lucius) 6e-06 

SCOP) dltnp 1.34.1.5.1 Troponin C [chicken (Gallus gallus) 9e-ll 

EC) 2.7.1.107 Diacylglycerol kinase 2e-08 

PIRKW] blocked amino end le-100 

PIRKW] phosphotransferase 2e-08 

PIRKW] duplication 4e-17 

PIRKW] tandem repeat 7e-06 

PIRKW] heterodimer 4e-17 

PIRKW] heart 6e-09 

PIRKW] zinc 2e-08 

PIRKW] serine/threonine-specific protein kinase le-06 

PIRKW] muscle contraction le-08 

PIRKW] acetylated amino end 4e-09 

PIRKW] ATP 2e-08 

PIRKW] skeletal muscle 6e-09 
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[PIRKW] 


3XUllO^ LiCllloUUV* Llvll XV? J X. 


[PIRKW] 


nrrthpi n Iri naoo ?p-OR 

L/ X. \J C C X 1 1 iQ U C WW 


[PIRKW] 


calcium binding le — 100 


[PIRKW] 


alternative splicing 2e— 13 


[PIRKW] 


mofhvl atpri amino e\f\(\ 1p — 09 


[PIRKW] 


111.1. 1 1 iiiaiiiciiuo ic uo 


[ PIRKW] 




[PIRKW] 


cardiac muscle 6e— 09 


[ PIRKW] 


m i 1 q p 1 o o — Q 


[PI RKW ] 


aiyris tyidLion us iuu 


{ PIRKW] 


bC najlu xc lul 


[ PIRKW] 




f SUPFAM1 




f SUPFAM 1 


imscoi nfi^H pa 1 m r\fi ill i naral ahaH r\ir a "1 ri c Qa. J 1 


[SUPFAM] 


qnpr-rplatpd orof pin T.n^l 7p— flfi 


[SUPFAM] 


calmodulin repeat homology le~101 


[SUPFAM) 


human diacylglycerol kinase 2e-08 


[SUPFAM] 


protein kinase C zinc-binding repeat homology 2e-08 


[SUPFAM] 


protein kinase homology 2e-08 


(SUPFAM] 


calmodulin le-101 


[PROSITE] 


EF HAND 3 


[PROSITE] 


CK2 PHOSPHO SITE 7 


[PROSITE] 


PKC_PHOSPHO_SITE 3 


[PFAM] 


EF hand 


[KWJ 


All Alpha 


(KW] 


3D 



SEQ MGKQNSKLRPEVMQDLLESTDFTEHEIQEWYKGFLRDCPSGHLSMEEFKKI YGNFFPYGD 

lrec- HHHHHHHHHTTTTCCCHHHHHHHHHHHHHHTTTTEEEHHHHHHHHHHHTTTTC 

SEQ ASKFAEHVFRT FDANGDGTIDFREFIIALSVTSRGKLEQKLKWAFSMYDLDGNGYISKAE 

lrec- HHHHHHHHHHHH CEEEHHHHHHHHHHHHCCCGGGHHHHHHHHHTTTTCCCEEHHH 

SEQ MLVIVQAIYKMVSSVMKMPEDESTPEKRTEKIFRQMDTNRDGKLSLEEFIRGAKSDPSIV 

lrec- HHHHHHHHHHCCTTGGGCTTTTTCHHHHHHHHHHHHCCTTTTEECHHHHHHHHHHCHHHH 

SEQ RLLQCDPSSAGQF 

lrec- HHHCCCH 



Prosite for DKFZphfbr2_23b21 . 1 



PS00005 


92->95 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


149->152 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


158->161 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


23->27 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


44->48 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


106->110 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


117->121 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


143->147 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


158->162 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


165->169 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00018 


73->86 


EF HAND 




PDOC00018 


PS00018 


109->122 


EF HAND 




PDOC00018 


PS00018 


157->170 


EF HAND 




PDOC00018 



Pfam for DKFZphfbr2_23b21 . 1 



HMM_NAME EF hand 

HMM *MFrmMDkDGDGyIDFEEFmeMMkem* 

+FR +D +GDG+IDF EF+ +++ 
Query 68 VFRTFDANGDGTIDFREFIIALSVT 



92 



30.75 100 128 1 29 dkf zphfbr2_23b21 . 1 nearly identical to bovine neurocalcin 

Alignment to HMM consensus: 
Que r y * E I qEM Fr mMDk DG DGyl DFEE FmeMM k em* 

++++F+M+D DG+GYI++ E++++++++ 

dkfzphfbr2 100 KLKWAFSMYDLDGNGYISKAEMLVIVQAI 128 

Query 176 1 29 dkf zphfbr2_23b21 . 1 nearly identical to bovine neurocalcin 

Alignment to HMM consensus: 
HMM * ElqEMFrmMDkDGDGyl DFEEFmeMMkem* 

+++FR MD+++DG+++ EEF++ K+ 
Query 148 RTEKIFRQMDTNRDGKLSLEEFIRGAKSD 176 
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DKFZphfbr2_23f2 



group: brain derived 

DKFZphfbr2_23f2 encodes a novel 182 amino acid protein with weak similarity to S. pombe 
Vpa29p. ~ 

No informative BLAST results; no predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 

similarity to Vps29p 

complete cDNA, complete cds, EST hits 

S.cerevisiae and S. pombe Vps29p are involved in vacuolar protein 
sorting 

part of the cDNA is encoded by HSAC2350, splice pattern 4 exons 
Sequenced by AGOWA 
Locus: /map="12q24" 
Insert length: 1016 bp 

Poly A stretch at pos. 996, polyadenylation signal at pos. 974 

1 GAATGGGGAG GAGCCAGAGG AAGAGGGCGG CGACGGTGGT GGTGACTGAG 

51 CGGAGCCCGG TGACAGGATG TTGGTGTTGG TATTAGGAGA TCTGCACATC 

101 CCACACCGGT GCAACAGTTT GCCAGCTAAA TTCAAAAAAC TCCTGGTGCC 

151 AGGAAAAATT CAGCACATTC TCTGCACAGG AAACCTTTGC ACCAAAGAGA 

201 GTTATGACTA CCTCAAGACT CTGGCTGGTG ATGTTCATAT TGTGAGAGGA 

251 GACTTCGATG AGAATCTGAA TTATCCAGAA CAGAAAGTTG TGACTGTTGG 

301 ACAGTTCAAA ATTGGTCTGA TCCATGGACA TCAAGTTATT CCATGGGGAG 

351 ATATGGCCAG CTTAGCCCTG TTGCAGAGGC AATTTGATGT GGACATTCTT 

401 ATCTCGGGAC ACACACACAA ATCTGAAGCA TTTGAGCATG AAAATAAATT 

4 51 CTACATTAAT CCAGGTTCTG CCACTGGGGC ATATAATGCC TTGGAAACAA 

501 ACATTATTCC ATCATTTGTG TTGATGGATA TCCAGGCTTC TACAGTGGTC 

551 ACCTATGTGT ATCAGCTAAT TGGAGATGAT GTGAAAGTAG AACGAATCGA 

601 ATACAAAAAA CCTTAAAGCC AGGCCTGTCT TGATGATTTT TGGTTTTTTT 

651 TCATTGTCCT GTTGAAATCA AGTAATTAAA CATTTAAGAG CCACAAAATT 

701 GTATCACTTT TATAATATTT TGCAGTAAAA TATAATACCA TCTTCTCTGT 

751 TAATACATAA TTGCTCCAAG CTTCCTGTAA ACTATAAGAA TATATTTAGT 

801 TTACAGTATA TGGATTCTAT GAAAAAATGT CCACAACACA GTAATTGGTC 

851 ACTTGTTAAG AAAAATTTAT CCTTGTAAGT ATCTTCAAAG TTGATATTTG 

901 GAACTTTATT CCAAAAGTAG TGCATGTGGA GAAAGAATCT AGACTTTCTT 

951 GTATACATTT TTCTCTTCTC CAGTAATAAA CAATTACCTT TCATTGAAAA 

1001 AAAAAAAAAA AAAAAA 

BLAST Results 



Entry HSAC2350 from database EMBLNEW: 

Homo sapiens 12q24 PAC P424M6 Length - 167,217 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 68 bp to 613 bp; peptide length: 182 
Category: similarity to known protein 
Prosite motifs: RGD (60-63) 



1 MLVLVLGDLH IPHRCNSLPA KFKKLLVPGK IQHILCTGNL CTKESYDYLK 
51 TLAGDVHIVR GDFDENLNYP EQKWTVGQF KIGLIHGHQV IPWGDMASLA 
101 LLQRQFDVDI LISGHTHKSE AFEHENKFYI NPGSATGAYN ALETNIIPSF 
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151 VLMDIQASTV VTYVYQLIGD DVKVERIEYK KP 

BLASTP hits 

Entry CEZK1128_6 from database TREMBL: 
"ZK1128.1"; Caenorhabditis elegans cosmid ZK1128 
Length = 523 

Score - 400 (140.8 bits), Expect - 2.3e-37, P = 2.3e-37 
Identities = 81/150 (54%), Positives - 106/150 (70%) 

Entry S46793 from database PIR: 

hypothetical protein YHR012c - yeast (Saccharomyces cerevisiae) 
Length = 282 

Score => 180 (63.4 bits), Expect - 3.7e-37, Sum P(3) = 3.7e-37 
Identities = 35/71 (49%), Positives = 44/71 (61%) 

Entry AB011824_1 from database TREMBL: 
"Vps29 w ; Schizosaccharomyces pombe mRNA for Vps29, 
partial cds. Schizosaccharomyces pombe (fission yeast) 
Length = 176 

Score = 189 (66.5 bits), Expect = 2.7e-27, Sum P(2) = 2.7e-27 
Identities = 33/72 (45%), Positives « 50/72 (69%) 



Alert BLASTP hits for DKFZphfbr2_23f2, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_23f 2, frame 2 



Report for DKFZphfbr2_23f2 .2 



1 LENGTH J 182 

(MW) 20445.84 

[pi] 6.29 

[HOMOL] TREMBL : CEZK1 128_6 gene: "ZK1128 . 8"; Caenorhabditis elegans cosmid ZK1128 2e-51 

[FUNCAT] 06.04 protein targeting, sorting and translocation [S. cerevisiae, YHR012w] 
le-27 

(FUNCAT) 08.13 vacuolar transport [S. cerevisiae, YHR012w] le-27 

1 FUNCAT ] 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YHR012w] 

le-27 

(FUNCAT] 30.08 organization of golgi [S. cerevisiae, YHR012w) le-27 

(FUNCAT] 09.25 vacuolar and lysosomal biogenesis (S. cerevisiae, YHR012w] le-27 

(FUNCAT] r general function prediction [M. jannaschii, MJ0623] le-16 

(BLOCKS) BL01269D 

(BLOCKS] BL01269A 

(PROSITE] RGD 1 

[PROSITE] MYRISTYL 4 

[PROSITE] PKC_PHOSPHO_SITE 1 

[KW] Alpha_Beta 



SEQ MLVLVLGDLHIPHRCNSLPAKFKKLLVPGKIQHILCTGNLCTKESYDYLKTLAGDVHIVR 

PRD ccceeecccccccccccchhhhhhhhhhcceeeeeecccccchhhhhhhhhhhhceeeee 

SEQ GDFDENLNYPEQKVVTVGQFKIGLIHGHQVIPWGDMASLALLQRQFDVDILISGHTHKSE 

PRD cccccccccccceeeeeccceeeeecccccccccchhhhhhhhhhhcceeeeeccccccc 

SEQ AFEHENKFYINPGSATGAYNALETNI I PSFVLMDIQASTVVTYVYQLIGDDVKVERI EYK 

PRD ccccccccccccccccccccccccccccceeeeeccccceeeeeeeecccceeeeeeeec 

SEQ KP 

PRD cc 



Prosite for DKFZphfbr2_23f 2 . 2 



PS00005 
PS00008 
PS00008 
PS00008 
PS00008 
PS00016 



116->119 
38->44 
83->89 
133->139 
137->143 
60->63 



P KC_PHOS PHO_S I TE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

RGD 



PDOC00005 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00016 



(No Pfara data available for DKFZphfbr2_23f 2 .2) 
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DKFZphfbr2_23124 



group: intracellular transport and trafficking 

DKFZphfbr2_23124 . 2 encodes a novel 34B amino acid protein with similarity to human 
glycoprotein gp36b and canine VIP36 glycoprotein. 

The vesicular protein VIP36 (36 kDa vesicular integral membrane protein) shows homology to 
leguminous plant lectins. The protein is localized to the Golgi apparatus, endosomal and 
vesicular structures and the plasma membrane. VIP36 binds to sugar residues of 
glycosphingolipids and/or glycosylphosphatidyl-ino3itol anchors and might provide a link 
between the extracellular/luminal face of glycolipid rafts and the cytoplasmic protein 
segregation machinery. Gp36 is located within the endoplasmatic reticulum. For the novel 
protein, a lectin character is predicted. Due to the intracellular localisation of the homolog 
proteins, it should be involved in intracellular transport and trafficking. 

The new protein can find application in modulating/blocking intracellular transport and 
trafficking. 



strong similarity to human GP36b glycoprotein 
complete cDNA, complete cds, EST hits 

potential start at Bp 29 matches kozak consensua ANNatgG 
similarity to lectins, 

Sequenced by AGOWA 

Locus: /map= w 2 tt 

Insert length: 2416 bp 

Poly A stretch at pos. 2394, no polyadenylation signal found 



1 GGGGGATGAA GGGTCGTTGG TGGGAAAGAT GGCGGCGACT CTGGGACCCC 
51 TTGGGTCGTG GCAGCAGTGG CGGCGATGTT TGTCGGCTCG GGATGGGTCC 
101 AGGATGTTAC TCCTTCTTCT TTTGTTGGGG TCTGGGCAGG GGCCACAGCA 
151 AGTCGGGGCG GGTCAAACGT TCGAGTACTT GAAACGGGAG CACTCGCTGT 
201 CGAAGCCCTA CCAGGGTGTG GGCACAGGCA GTTCCTCACT GTGGAATCTG 
251 ATGGGCAATG CCATGGTGAT GACCCAGTAT ATCCGCCTTA CCCCAGATAT 
301 GCAAAGTAAA CAGGGTGCCT TGTGGAACCG GGTGCCATGT TTCCTGAGAG 
351 ACTGGGAGTT GCAGGTGCAC TTCAAAATCC ATGGACAAGG AAAGAAGAAT 
401 CTGCATGGGG ATGGCTTGGC AATCTGGTAC ACAAAGGATC GGATGCAGCC 
451 AGGGCCTGTG TTTGGAAACA TGGACAAATT TGTGGGGCTG GGAGTATTTG 
501 TAGACACCTA CCCCAATGAG GAGAAGCAGC AAGAGCGGGT ATTCCCCTAC 
551 ATCTCAGCCA TGGTGAACAA CGGCTCCCTC AGCTATGATC ATGAGCGGGA 
601 TGGGCGGCCT ACAGAGCTGG GAGGCTGCAC AGCCATTGTC CGCAATCTTC 
651 ATTACGACAC CTTCCTGGTG ATTCGCTACG TCAAGAGGCA TTTGACGATA 
701 ATGATGGATA TTGATGGCAA GCATGAGTGG AGGGACTGCA TTGAAGTGCC 
751 CGGAGTCCGC CTGCCCCGCG GCTACTACTT CGGCACCTCC TCCATCACTG 
801 GGGATCTCTC AGATAATCAT GATGTCATTT CCTTGAAGTT GTTTGAACTG 
851 ACAGTGGAGA GAACCCCAGA AGAGGAAAAG CTCCATCGAG ATGTGTTCTT 
901 GCCCTCAGTG GACAATATGA AGCTGCCTGA GATGACAGCT CCACTGCCGC 
951 CCCTGAGTGG CCTGGCCCTC TTCCTCATCG TCTTTTTCTC CCTGGTGTTT 
1001 TCTGTATTTG CCATAGTCAT TGGTATCATA CTCTACAACA AATGGCAGGA 
1051 ACAGAGCCGA AAGCGCTTCT ACTGAGCCCT CCTGCTGCCA CCACTTTTGT 
1101 GACTGTCACC CATGAGGTAT GGAAGGAGCG GGCACTGGCC TGAGCATGCA 
1151 GCCTGGAGAG TGTTCTTGTC TCTAGCAGCT GGTTGGGGAC TATATTCTGT 
1201 CACTGGAGTT TTGAATGCAG GGACCCCGCA TTCCCATGGT TGTGCATGGG 
1251 GACATCTAAC TCTGGTCTGG GAAGCCACCC ACCCCAGGGC AATGCTGCTG 
1301 TGATGTGCCT TTCCCTGCAG TCCTTCCATG TGGGAGCAGA GGTGTGAAGA 
1351 GAATTTACGT GGTTGTGATG CCAAAATCAC GGAACAGAAT TTCATAGCCC 
1401 AGGCTGCCGT GTTGTTTGAC TCAGAAGGCC CTTCTACTTC AGTTTTGAAT 
1451 CCACAAAGAA TTAAAAACTG GTAACACCAC AGGCTTTCTG ACCATCCATT 
1501 CGTTGGGTTT TGCATTTGAC CCAACCCTCT GCCTACCTGA GGAGCTTTCT 
1551 TTGGAAACCA GGATGGAAAC TTCTTCCCTG CCTTACCTTC CTTTCACTCC 
1601 ATTCATTGTC CTCTCTGTGT GCAACCTGAG CTGGGAAAGG CATTTGGATG 
1651 CCTCTCTGTT GGGGCCTGGG GCTGCAGAAC ACACCTGCGT TTCGCTGGCC 
1701 TTCATTAGGT GGCCCTAGGG AGATGGCTTT CTGCTTTGGA TCACTGTTCC 
1751 CTAGCATGGG TCTTGGGTCT ATTGGCATGT CCATGGCCTT CCCAATCAAG 
1801 TCTCTTCAGG CCCTCAGTGA AGTTTGGCTA AAGGTTGGTG TAAAAATCAA 
1851 GAGAAGCCTG GAAGACACCA TGGATGCCAT GGATTAGCTG TGCAACTGAC 
1901 CAGCTCCAGG TTTGATCAAA CCAAAAGCAA CATTTGTCAT GTGGTCTGAC 
1951 CATGTGGAGA TGTTTCTGGA CTTGCTAGAG CCTGCTTAGC TGCATGTTTT 
2001 GTAGTTACGA TTTTTGGAAT CCCTCTTTGA GTGCTGAAAG TGTAAGGAAG 
2051 CTTTCTTCTT ACACCTTGGG CTTGGATATT GCCCAGAGAA GAAATTTGGC 
2101 TTTTTTTTCT TAATGGACAA GGGACAGTTG CTGTTCTCAT GTTCCAAGTC 
2151 TGAGAGCAAC AGACCCTCAT CATCTGTGCC TGGAAGAGTT CACTGTCATT 
2201 GAGCAGCACA GCCTGAGTGC TGGCCTCTGT CAACCCTTAT TCCACTGCCT 
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2251 TATTTGACAA GGGGTTACAT GCTGCTCACC TTACTGCCCT GGGATTAAAT 

2301 CAGTTACAGG CCAGAGTCTC CTTGGAGGGC CTGGAACTCT GAGTCCTCCT 

2351 ATGAACCTCT GTAGCCTAAA TGAAATTCTT AAAATCACCG ATGGAACCAA 

2401 AAAAAAAAAA AAAAAA 



BLAST Results 



Entry HS622145 from database EMBL: 
human STS WI-6746. 
Score = 1079, p = 5.1e-43, identities = 219/223 

Entry G42541 from database EMBLNEW : 

SHGC-58649 Human Homo sapiens STS genomic, sequence tagged site. 
Score = 1091, P = 1.7e-43, identities » 219/220 



Medline entries 



94265253: 

A putative novel class of animal lectins in the secretory pathway 
homologous to leguminous 
lectins. 

9420B543: 

VIP36, a novel component of glycolipid rafts and exocytic carrier 
vesicles in epithelial cells. 



Peptide information for frame 2 



ORF from 29 bp to 1072 bp; peptide length: 348 
Category: strong similarity to known protein 



1 MAATLGPLGS WQQWRRCLSA RDGSRMLLLL LLLGSGQGPQ QVGAGQTFEY 

51 LKREHSLSKP YQGVGTGSSS LWNLMGNAMV MTQYIRLTPD MQSKQGALWN 

101 RVPCFLRDWE LQVHFKIHGQ GKKNLHGDGL AIWYTKDRMQ PGPVFGNMDK 

151 FVGLGVFVDT YPNEEKQQER VFPYISAMVN NGSLSYDHER DGRPTELGGC 

201 TAIVRNLHYD TFLVIRYVKR HLTIMMDIDG KHEWRDCIEV PGVRLPRGYY 

251 FGTSSITGDL SDNHDVISLK LFELTVERTP EEEKLHRDVF LPSVDNMKLP 

301 EMTAPLPPLS GLALFLIVFF SLVFSVFAIV IGIILYNKWQ EQSRKRFY 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_23124, frame 2 

PIR:G01447 GP36b glycoprotein - human, N = 1, Score = 1001, P = 
5.9e-101 

SWISSPROT:VP36_CANFA VESICULAR INTEGRAL-MEMBRANE PROTEIN VIP36 
PRECURSOR (VIP36)., N = 1, Score = 990, P = 8.6e-100 

TREMBL : CET04G9_2 gene: "T04G9.3"; Caenorhabditis elegans cosmid 
T04G9., N = 1, Score « 614, P « 6e-60 

PIR:S42626 ER-golgi intermediate compartment protein - human, N - 2, 
Score = 397, P = le-42 



>PIR:G01447 GP36b glycoprotein - human 
Length = 356 

HSPs: 

Score = 1001 (150.2 bits), Expect « 5.9e-101, P - 5.9e-101 
Identities = 197/356 (55%) , Positives * 256/356 (71%) 

Query: 1 MAATLGPLGSWQQWRRCLSARDG SRMLLLLLLLGSGQGPQOVGAGQTFEYLK 52 

MAA G + W RRCL R G + L LLLLLGS + G + E+LK 

Sbjct: 1 MAAE-GWIWRWGWGRRCLG-RPGLLGPGPGPTTPLFLLLLLGSVTA— DITDGNS-EHLK 55 
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Query: 53 REHSLSKPYQGVGTGSSSLWNLMGNAMVMTQYIRLTPDMQSKQ^ALWNRVPCFLRDWELQ 112 

REHSL KPYQGVG+ S LW+ G+ M+ +QY+RLTPD +SK+G++WN PCFL+DWE+ 
Sbjct: 56 REHSLIKPYQGVGSSSMPLWDFQGSTMLTSQYVRLTPDERSKEGSIWNHQPCFLKDWEMH 115 

Query: 113 VHFKIHGQGKKNLHGDGLAIWYTKDRMQPGPVFGNMDKFVGLGVFVDTYPNEEKQQERVF 172 

VHFK+HG GKKNLHGDG+A+WYT+DR+ PGPVFG+ D F GL +F+DTYPN+E ERVF 
Sbjct: 116 VHFKVHGTGKKNLHGDGIALWYTRDRLVPGPVFGSKDNFHGLAIFLDTYPNDETT-ERVF 174 

Query: 173 PYISAMVNNGSLSYDHERDGRPTELGGCTAIVRNLHYDTFLVIRYVKRHLTIMMDIDGKH 232 

PYIS MVNNGSLSYDH +DGR TEL GCTA RN +DTFL +RY + LT+M D++ K+ 
Sbjct: 175 PYISVMVNNGSLSYDHSKDGRWTELAGCTADFRNRDHDTFLAVRYSRGRLTVMTDLEDKN 234 

Query: 233 EWRDCIEVPGVRLPRGYYFGTSSITGDLSDNHDVISLKLFELTVERTPEEEKLHRDVFLP 292 

EW++CI++ GVRLP GYYFG S+ TGDLSDNHD+I S+KLF+L VE TP+EE + P 
Sbjct: 235 EWKNCIDITGVRLPTGYYFGASAGTGDLSDNHDIISMKLFQLMVEHTPDEESIDWTKIEP 294 

Query: 293 SVDNMKLPEMTAPLP PLSGLALFLIVFFSLVFSVFAIVIGIILYNKWQEQSRK 345 

SV+ +K P+ P PL+G +FL++ +L+ V V+G +++ K QE++ K 

Sbjct: 295 S VN FL K S P K DNV DD PTGN FR SG P LTGWRV FLL LLC ALLG I VVC A WG A V V FQKRQE RN - K 353 

Query: 346 RFY 348 
RFY 

Sbjct: 354 RFY 356 



Pedant information for DKFZphfbr2_23124 , frame 2 



Report for DKFZphfbr2_23124 .2 



[LENGTH] 


348 






[MW] 


39711.10 






[pi] 


8.55 






[HOMOL] 


PIR:G01447 GP36b glycoprotein - human le-101 


[PIRKW] 


lectin 2e-37 






[PIRKW] 


transmembrane protein 2e-37 


[PIRKW J 


endoplasmic reticulum 


2e-37 


[PIRKW] 


Golgi apparatus 2e 


-37 




[PROSITE] 


AMI DAT ION 1 






I PROS I TE] 


MYRISTYL 5 






[PROSITE] 


CK2 PHOSPHO SITE 




2 


[PROSITE] 


GLYCOSAMINOGLYCAN 




1 


[PROSITE] 


PKC PHOSPHO SITE 




3 


[PROSITE] 


ASN_GLYCOSYLATION 




1 


[KW] 


Alpha Beta 






[KW] 


SIGNAL PEPTIDE 39 






[KW] 


LOW_COMPLEXITY 


7.76 % 



SEQ MAATLGPLGSWQQWRRCLSARDGSRMLLLLLLLGSGQGPQQVGAGQTFEYLKREHSLSKP 

SEG xxxxxxx . 

PRD ccccccccccccccccccccccchhhhhhhhhhhcccccccccccchhhhhhhhhhhccc 

SEQ YQG VGT GS S S LWNLMGN AMVMTQ Y I RLT P DMQSKQG ALWN RV PC FLRDWELQ VH FK I HGQ 

SEG 

PRD cccccccccceeecccccccccceeeeccchhhhhcccccccccchhhhhhhheeeeecc 

SEQ GKKNLHGDGLAIWYTKDRMQPGPVFGNMDKFVGLGVFVDTYPNEEKQQERVFPYISAMVN 

SEG 

PRD ccccccccceeeeeecccccccccccccccccceeeeeecccccccccccccceeeeeec 

SEQ NGSLSYDHERDGRPTELGGCTAIVRNLHYDTFLVIRYVKRHLTIMMDIDGKHEWRDCIEV 

SEG 

PRD ccccccccccccccccccccccccccccccceeeehhhhhhheeeeeccccccccccccc 

SEQ PGVRLPRGYYFGTSSITGDLSDNHDVISLKLFELTVERTPEEEKLHRDVFLPSVDNMKLP 

SEG 

PRD cccccccccccccccccccccccchhhhhhhhhhhhhccccccccccccccccccccccc 

SEQ EMTAPLPPLSGLALFLIVFFSLVFSVFAIVIGIILYNKWQEQSRKRFY 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccc 



Prosite for DKFZphfbr2_23124 . 2 

PS 00 001 181 -> 185 ASN_GLYCOSYLATION PDOC00001 
PS00002 35->39 GLYCOSAMINOGLYCAN PDOC00002 

PS00005 19->22 PKC PHOSPHO SITE PDOC00005 
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PS00005 


268->271 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


343->346 


PKC~PHOSPHO" 


"site 


PDOC00005 


PS00006 


19->23 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


279->293 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


43->49 


MYRISTYL 




PDOC00008 


PS00008 


63->69 


MYRISTYL 




PDOC00008 


PS00008 


65->71 


MYRISTYL 




PDOC00008 


PS00008 


96->102 


MYRISTYL 




PDOC00008 


PS00008 


198->204 


MYRISTYL 




PDOC00008 


PS00009 


120->124 


AMIDATION 




PDOC00009 



(No Pfam data available for DKFZphfbr2_23124 . 2 ) 
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DKFZphfbr2_23nl6 



group: signal transduction 

DKFZphfbr2_23nl6. 1 encodes a novel 292 amino acid protein with weak similarity to putative 
phosphatidylinositol-4-phosphate 5-kinase of Arabidopsis thaliana. 

The novel proteins contains a WW domain which has been originally described as a short 
conserved region in a number of unrelated proteins, among them dystrophin, the gene 
responsible for Duchenne muscular dystrophy. The domain, which spans about 35 residues, is 
repeated up to 4 times in some proteins. It has been shown to bind proteins with particular 
proline-motif s, [API -P-P- [AP] -Y, and thus resembles somewhat SH3 domains. This domain is 
frequently associated with other domains typical for proteins in signal transduction 
processes. Examples of proteins containing the WW domain are Dystrophin, Utrophin, vertebrate 
YAP protein (binds the SH3 domain of the Yes oncoprotein) , murine NEDD-4 (embryonic 
development and differentiation of the central nervous system) , IQGAP (human GTPase activating 
protein acting on ras) . Therefore the new protein should be involved in intracellular signal 
transduction. 

The new protein can find application in modulating/blocking intracellular signal transduction 
pathways . 



similarity to putative phosphatidylinositol-4-phosphate 5-kinase 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2936 bp 

Poly A stretch at pos. 2916, polyadenylation signal at pos. 2873 



1 GGGGGCGCTC CCGAGAAAGA 
51 GGATGTTTCA GCAGCCCCTG 
101 TGAGGGCGCA GGACCTGAGG 
151 CAGGAGTGGC AGGATGGTTG 
201 GAAGCTTGGA TATGGCAAAT 
251 GGCAGTTTTA CCGGGACCAC 
301 GATGGCTCCA GTTTCACGGG 
351 CGGCACCATG TACATGAAGA 
401 ACATTGTCAA CCTTCTCCTG 
4 51 GATGAGGGTC TCACGGCACT 
501 CCAGTCCTTC AAGCCCAATG 
551 AACCTCCAAA ATTCCCAGTT 
601 ACAAACCTGG AGTCTCTGTA 
651 CTATGAGCTG AGGCCACCGC 
701 GCAGCCACGA GGGCGGCCAC 
751 ATAGACCACA GGAGCAGCTC 
801 CAGCCTTGGC CATGTGGAAA 
851 ACCGGGGCAG TCTGTGCAGT 
901 GTGTGCGACT TCTCCATCGA 
951 CCAGTCCCAC AGCTTGCTGA 
1001 GCTTCGACAA AGGGACCATG 
1051 GTCCTGGCAC CAGCTGGTGG 
1101 TATGCTCAGC AGACCCACGT 
1151 TCTGGAGATG TGTGTCTGAA 
1201 TGCCATGGCC AGCCCTGTGG 
1251 CCACCCCAGC CCTGTGGGGG 
1301 CGCCAGCCCT GCTTTGGCCT 
1351 GCCTCAGCAG GGGCCACTGT 
1401 GAGTGACACC TGCCTGGGCA 
1451 ATACCAGTGT GTCTCAAAAT 
1501 CCTTTCAGGG AGTCTGTGGG 
1551 CTCTGTTAGC CTTCTCCTTG 
1601 GGCCGTGCAG ACCACCAGCC 
1651 GCTTGGGGGC ATGGTATTCC 
1701 TGTCTCACTG AAGAATGCCT 
1751 CATTAAATCT TGCTCCTTGC 
1801 AAACATGGGA AGGACACTGA 
1851 CTTCCCCGCA AGAAGCGCTT 
1901 GCATCTTCCC AACCTCCTGC 
1951 TATGTGGCTG TTCATTCTCC 
2001 TATGACACTA TTTGTTGCTG 
2051 AATTCTGGAC ACTTGCCACC 
2101 TTTGAGACAT GGGTGTTCAG 
2151 GCAGTTTTGT GATACTGCCT 



GTGAGGGCGC GACGCGCACC AACGGTGGAG 
AGAAGGAAGA GGAGGAAGCT GAGGGCCCGC 
GAGTCCTACA TCCAGCTCGT CCAGGGTGTG 
CATGTACCAG GGGGAGTTTG GGTTGAACAT 
TCTCTTGGCC CACAGGCGAG TCATACCATG 
TGCCATGGCC TGGGTACCTA CATGTGGCCA 
CACATTTTAC CTCAGCCACC GAGAAGGCTA 
CACGGCTTTT CCAGACTCAC TGCCACAACG 
GACTGTGGGG CCGACGTGAA CAAGTGCTCA 
CAGCATGTGT TTCCTCCTCC ACTACCCCGC 
TTGCTGAACG GACCATACCT GAGCCCCAGG 
GTTCCAATCC TTTCATCATC ATTTATGGAC 
CTATGAGGTG AACGTGCCTT CCCAGGGTAG 
CAGCACCACT GCTCCTGCCA CGCGTCTCAG 
TTCCAGGACA CCGGGCAGTG TGGGGGGTCC 
TCTGAAGGGG GACTCCCCGT TGGTGAAGGG 
GCGGGCTTGA GGACGTGTTG GG AG AC AC AG 
GCTGAGACGA AATTTGAGTC CAACTTGTGT 
GCTCTCGCAG GCCATGCTGG AGAGAAGCGC 
AGATGGCCTC GCCCTCACCG TGCACCAGCA 
CGGAGGATGG CGCTGTCCAT GATCGAGTAG 
GGGTGGAGGG CCACCATCAG GGCTGAATCC 
CTCTTCCCTG TGCCAGTGGG AGGCGTTGTG 
TGTGTGAGCA TCCCTGTGTC GGTGGCTCCA 
GGGTGCCACG GTGACGGGCT GTTTTCAGTG 
TGCCACGGTG ACGGGCTGTT TTCAGTACCA 
TTGGCACTGG CCTGAAGTGT CTCTGTGGGA 
CAGGGGTCCT ATCCTAGCCA TAGTGCACGT 
GCTCTCACAC CCCTGCTGTC CACCCTGTCT 
GTGGTCTATG CACCCCCGGG GGTCCAAGAC 
GTCAAAATGA TTCTCTTGAT AACCCTGAGA 
TGTTGATGTT GGTGGATGGT ATGAAGACAG 
CCCAGCGTGC AGGGCAGCAG TGCCCGGCCT 
TTCACCACGG TGTGCACTTG CGGGGATGCC 
TTGACTAAGC AGAAAAGCAA TGACAAATTG 
GTACACACCC CTCGAATATT CTGGGTCGGA 
TGTGTGTCTG CCACAGACCA AGGCACACCG 
CCCCCAGGGC CAGAGTAGCA ACAGAATGCG 
CCCATTTTTG ATTGGAAGAA TGACCACTGG 
TGAACACAGC CTGCCACTTT AAGGAAAACA 
GCGAAATTTA CATTTTCAAG TGAATAGCAG 
ACCACCAAAA CCTTCATAGC TTCCCTTAAC 
AGGTTTTTCA CGTGAGATGG CGTTAGCAGC 
GAAGACATGC CGACAGTGCC CAGATCTCTT 
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2201 CTATTGGTGA GCCAGCTTTT CCCACACGGC CAAGTTCTGA TGTTGAACCA 
2251 TTGCCAGGTG GGTGAAGATC CATTGACAGT GAGAGGTGGG CCCGTGGGCT 
2301 TCAGTGCAGC CAGGCGCAGA AGGCTGGTTC ATGAGTGTCC AGCTCCGCCA 
2351 GGTAGCTAGC TCACCACCCC CAGCCTGGGT TCATGTAGTT CAAATAGGAA 
2401 GACCACGATG ATCAGAAAGG CTGCTCAAAT ACTCCTTCGT CCAGCCGCGT 
2451 ACCTGGGGGA GGCTGAATCT CCACTCACTT CCACCAAGGC TGTGCAGAGC 
2501 AGATAGGGGA ATCCAGCAAA GGTGGAAAAC AGTGCCATCC TTCTCCCCAA 
2551 CTGGTTTTGT TTTGTAAAAT AACTTTTTGT GACAGTGTTA CTTATTAGTA 
2601 ACATGCAGTG GGTTTGTTAT GGTTAACAAG TTGGTGAGCA TTATTGAGAG 
2651 GTGAAGCCAG CTGAGCTTCT GGGTTGGGTG GGGACTTGGA GAACTTTTGT 
2701 GTCTAGCTAA AGGATTGTAA ATGCACCAAT CAATGCTCAG TGTCTAGCTA 
2751 AAGGATTGTA AATGCACCAA TCAGCACTCT GTAAAATTGA CCAATCAGCG 
2801 TTCTGTAAAA TGGACCAATC AGTGGTCTGT AAAATGGACC AGTCAGCAGG 
2851 ATGTGGGCGG GGCCAAAAAA GGGAATAAAA GCTGGCCACC GCCAGGCTCC 
2901 CCACCAGCCT GCAGCGAAAA AAAAAAAAAA AAAAAA 



BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 



Peptide information for frame 1 



ORF from 172 bp to 1047 bp; peptide length: 292 
Category: similarity to unknown protein 
Prosite motifs: WW_DOMAIN 1 (19-24) 



1 MYQGEFGLNM KLGYGKFSWP TGESYHGQFY RDHCHGLGTY MWPDGSSFTG 

51 TFYLSHREGY GTMYMKTRLF QTHCHNDIVN LLLDCGADVN KCSDEGLTAL 

101 SMCFLLHYPA QSFKPNVAER TIPEPQEPPK FPVVPILSSS FMDTNLESLY 

151 YEVNVPSQGS YELRPPPAPL LLPRVSGSHE GGHFQDTGQC GGSIDHRSSS 

201 LKGDSPLVKG SLGHVESGLE DVLGDTDRGS LCSAETKFES NLCVCDFSIE 

251 LSQAMLERSA QSHSLLKMAS PSPCTSSFDK GTMRRMALSM IE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_23nl6, frame 1 

TREMBL:AB005902_1 product: w AtPlP5Kl M ; Arabidopsis thaliana mRNA for 

AtPIPSKl, complete cds., N = 2, Score - 138, P - l.le-06 

TREMBL:AF019380_1 product: "putative phosphatidylinositol-4-phosphate 
5-kinase"; Arabidopsis thaliana putative 

phosphatidylinositol-4 -phosphate 5-kinase mRNA, complete cds., N * 2, 
Score = 138, P ■= 1.4e-06 

PIR:T02098 probable phosphatidylinositol-4-phosphate 5-kinase - 
Arabidopsis thaliana, N ° 2, Score *» 135, P =» 6.7e-06 

>TREMBL:AB005902_1 product: "AtPIPSKl"; Arabidopsis thaliana mRNA for 
AtPIPSKl, complete cds. 
Length = 683 

HSPs: 

Score = 138 (20.7 bits), Expect = l.le-06, Sum P(2) = l.le-06 
Identities » 23/61 (37%), Positives = 35/61 (57%) 

Query: 1 MYQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGY 60 

MY+G++ G GKFSWP+G +Y G+F G GT+ DG ++ GT+ + G+ 

Sbjct: 34 MYEGDWKRGKASGKGKFSWPSGATYEGEFKSGRMEGFGTFTGADGDTYRGTWVADRKHGH 93 

Query: 61 G 61 
G 

Sbjct: 94 G 94 
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Score = 112 (16.8 bits), Expect = 9.7e-04, Sum P(2) - 9.7e-04 
Identities = 19/51 (37%), Positives » 27/51 (52%) 

Query: 12 LGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGYGT 62 

+G GK+ W G YG+R GG+WP G+++ G F EG+GT 
Sbjct: 22 IGSGKYLWKDGCMYEGDWKRGKASGKGKFSWPSGATYEGEFKSGRMEGFGT 72 

Score = 97 (14.6 bits), Expect - 4.4e-02, Sum P(2) = 4.3e-02 
Identities - 19/60 (31%), Positives = 32/60 (53%) 

Query: 2 YQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGYG 61 

Y+GEF G+G F+ G++Y G + D HG G + +G + GT+ + ++G G 

Sbjct: 58 YEGEFKSGRMEGFGTFTGADGDTYRGTWVADRKHGHGQKRYANGDFYEGTWRRNLQDGRG 117 

Score - 93 (14.0 bits), Expect - 1.2e-01, Sum P(2) = l.le-01 
Identities - 18/62 (29%), Positives ~ 34/62 (54%) 

Query: 2 YQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGYG 61 

Y+G + + K G+G+ + G+ Y G + R+ G G Y+W +G+ +TG + + G G 
Sbjct: 81 YRGTWVADRKHGHGQKRYANGDFYEGTWRRNLQDGRGRYVWRNGNQYTGEWRIGVISGKG 140 

Query: 62 TM 63 
+ 

Sbjct: 141 LL 142 

Score = 91 (13.7 bits), Expect = 2.0e-01, Sum P(2) = 1.8e-01 
Identities = 18/51 (35%), Positives = 24/51 (47%) 

Query: 2 YQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTF 52 

Y GE+ ++GG WPGYG+ GG+W DGSS G + 

Sbjct: 127 YTGEWRIGVISGKGLLVWPNGNRYEGLWENGIPKGNGVFTWSDGSSCVGAW 177 

Score = 90 (13.5 bits), Expect = 2.6e-01, Sum P(2) = 2.3e-01 
Identities = 17/60 (28%), Positives = 31/60 (51%) 

Query: 2 YQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGYG 61 

Y+G + N++ G G++ W G Y G++ G G +WP+G+ + G + +G G 

Sbjct: 104 YEGTWRRNLQDGRGRYVWRNGNQYTGEWRIGVISGKGLLVWPNGNRYEGLWENGIPKGNG 163 

Score =45 (6.8 bits), Expect - l.le-06. Sum P(2) - l.le-06 
Identities = 14/62 (22%), Positives » 26/62 (41%) 

Query: 215 VESGLEDVLGDTDRGSLCSAETKFESNLCVCDF--SIELSQAMLERSAQSHSLLKMASPS' 272 

V+SG + G+ +C E+ E+ CD ++E S +R + + + 

Sbjct: 205 VDS G AGS LGGE KV FP RI C I W E S DG E AG D I TC DI I DN VEASM IYRDRISVD RDG FRQ FKKN 264 

Query: 273 PC 274 
PC 

Sbjct: 265 PC 266 



Pedant information for DKFZphfbr2_23nl6, frame 1 



Report for DKFZphfbr2_23nl6 . 1 



[LENGTH] 292 

(MW) 32214.44 

[pi] 5.51 

[HOMOL] TREMBL:AB005902_1 product: "AtPIPSKl"; Arabidopsis thaliana mRNA for AtPIPSKl, 
complete cds . 7e-08 ~ 

[BLOCKS] BL01137A Hypothetical YBL055c/yjjV family proteins 

[PROSITE] WW_DOMAIN 1 1 

[PROSITE] MYRISTYL ~ 5 

[PROSITE] CK2 PHOSPHORS ITE 7 

[PROSITE] PKC~PHOSPH0_SITE 5 

[KW] Alpha Beta 

[KW] LOW_COMPLEXITY 4.11 % 



SEQ MYQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGY 

SEG 

PRD cccccccccccccccceeeccccccccccccccccccccccccccccccceeeeeccccc 

SEQ GTMYMKTRLFQTHCHNDIVNLLLDCGADVNKCSDEGLTALSMCFLLHYPAQSFKPNVAER 

SEG 

PRD cccchhhhhheeeccccchhhhhcccccccccccccchhhhhhhhhccccccccccceee 

SEQ TI PEPQEPPKFPVVPILSSSFMDTNLESLYYEVNVPSQGSYELRPPPAPLLLPRVSGSHE 
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SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



xxxxxxxxxxxx 

eccccccccceeeeeeeccccccccccceeeeeecccccccccccccccccccccccccc 

GGHFQDTGQCGGSIDHRSSSLKGDSPLVKGSLGHVESGLEDVLGDTDRGSLCSAETKFES 

cccccccccccccccccccccccccceeecccccccccccccccccccccceeeeecccc 

NLCVCDFSIELSQAMLERSAQSHSLLKMASPSPCTSSFDKGTMRRMALSMIE 

cccccchhhhhhhhhhhhhhhhhhhhcccccccccccccccchhhhhhhccc 



Prosite for DKFZphfbr2_23nl6 . 1 



PS00005 


55->58 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


112->115 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


200->203 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


226->229 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


282->285 


PKC PHOSPHO"" 


'site 


PDOC00005 


PS00006 


55->59 


CK2_PHOSPHO~ 


"site 


PDOC00006 


PS00006 


121->125 


CK2~PHOSPHO~ 


"site 


PDOC00006 


PS00006 


140->144 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


144->148 


CK2~PHOSPHO" 


"site 


PDOC00006 


PS00006 


217->221 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


236->240 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


276->280 


CK2~PHOSPH0" 


"site 


PDOC00006 


PS00008 


45->51 


MYRISTYL 




PDOC00008 


PS00008 


86->92 


MYRISTYL 




PDOC00008 


PS00008 


1"77->183 


MYRISTYL 




PDOC00008 


PS00008 


188->194 


MYRISTYL 




PDOC00008 


PS00008 


229->235 


MYRISTYL 




PDOC00008 


PS01159 


19->44 


WW DOMAIN 1 




PDOC50020 



(Ho Pfam data available for DKFZph£br2_23nl6. 1) 
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DKFZphfbr2_23o24 



group: brain derived 

DKFZphfbr2_23o24 encodes a novel 139 amino acid protein with similarity to CAAX-box proteins. 

The CAAX box is a 'prenyl group binding site found in a number of eukaryotic proteins, such as 
which is found in Ras- and ras-like proteins such as Rho, Rab, Rac, Ral, and Rap, as well as 
in nuclear lamins A and B, some G protein alpha and gamma subunits and some dnaJ-like 
proteins. These proteins are posttranslationally modified at this site by the attachment of 
either a farnesyl or a geranyl-geranyl group to a cysteine residue. 

No informative BLAST results; no predictive prosite, pfam or SCOP motif e 

The new protein can find application in studying the expression profile of brain-specific 
genes. 



similarity to lectins 

complete cDNA, complete cds, 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 3564 bp 

Poly A stretch at pos. 3541, 



EST hits 



no polyadenylation signal found 



1 GAATGGCTCC GCAGATGGCC GGC ACT GAGA GCCAGCAAGA AGCGGAGGAG 
51 ATGGGCCTTC AGCAGGGGGT TGCGGGGGGA GCTTTAAACT GAGCCCTGTA 
101 AACATGGCAG AACTGCTCAG TGGGAGACTC TCAGCACAGA CGGTCATGGG 
151 GAAGTGAGTG CAGTTCATTT GTAATCTTGT TGTCGAGTTC TGGGTTTTTT 
201 TTGTTTGTTT CGTAACTTTA AAGGTATGCA CTTTATATAG ATTTATTTAT 
251 TTGCTGGGAC CGTTACTCAG AGTTCCTAGA AATGTACACA GCTTTTTTAC 
301 CAGGGTTACT CCTCAGAATC ACTTGTCACT TCTTTAAATG AATGAATGAA 
351 TGTGCCAGGC CCTATGCCTG GAGGTTGGGA GCTTCATCTA CATCACATTC 
401 TAACAGGTGA CCACTGGGGT AAGCACTGTG TGACTGCAAA GCCAGGGTGT 
4 51 GTTTCCATCA ACACCCAGAT GACCGTGCCT ATGTGCCCCT GTTGTCCTCC 
501 CTCCAGGACT GCCTCCTCAC CCCACCCCTT TCTGCAGCTC CTCATCTAAA 
551 CATCTCGCCT GGTGAGGTCA CGGCTTAGCC TGTTGGCCAG TGGCCCCACC 
601 ACCATCCTTC CCCCTGTGCA GATTGGAGGA GGCCAGGTCT CTCCCCTTAG 
651 CTCCTATGTC CCCTTCACCC CCCATGGCAC AGATGAGACA TTCACAGAGT 
701 TTGCAGATGA TGGAAGAGAA GACTCCAGGT TGCCAGGTGT GTCCACTCTC 
751 AGGAACCCCC AGCCCAAGCC TCACTGCTCG TGTTCCCAGC CAACCCCAGC 
801 ACGGGGGATA CGCCGGTGCT GTTTCCCTGC TCAGATACAA CCAGTTACCA 
851 GAAACGACCT CACCCCTCCA ACCACTTTCC AAGGTGCCAG GACAGAGAAG 
901 CCCTTCACTG GCCCACCCAG GGCAGTTGAC AGAGGGATGC CCTCCTTGGA 
951 GGGGAGCCTC ACCTCTACCC ACAGGGCCGC GGCCTTGTCC TGGATTCTCA 
1001 CCGGGGCAGT CACGTCAGGA TGGAGAGGTC CCATGTCAGC CAGTTCTTTG 
1051 GTGGGGGTCA TGTAGTCTGA AATGACCTGC CGATGGTCCA GGCTGAGCCA 
1101 GGGAAGCTGA GCCTGGGTGC CTTTTTGGTG CCTACTCTGA CTTGAGTTGG 
1151 ATTCATGCCA CAGACCCACC TTCTTGAGCA ACAACACATA TAGCCACCAA 
1201 CACAAGAGCC AGGCACACAC TGAGCAGAGA AAGTCCCTGT CGCCTCACCA 
1251 CCCAAAAACT CCAGCTTTGC AGAGACCAAG GTTCTTCTCT ACCTTTGCAG 
1301 AAGCCTCTGT GACCAAACCC GGAGCTTGCC CTTCTGAGGC CTCTAGCATT 
1351 TCTCCAGGTG TTTTTCAGAG GACTTGGTTT AAATTTGTTC ACCCCAAATG 
1401 TGGTCTTTCC CGGATCATGA AAGGATCTGC CGCAAAGGTG AATCTGAGTC 
1451 TCCTCAGAGT CATATGAGAC TGAAACTGCT TATAACATTT CCGTGACCTA 
1501 ATAAGTCTTC CAAAAATGTA GGGTATTAAG AGTTTAGTGA CATTAAAAAG 
1551 TTTAGTCGAA AATATCGTGA TTCAGGTATA TTTAGACATT TGATTCATGC 
1601 CAAATTGCCA CTGTTAACAG AAAACACACC CCAAGCACAT TAATGCCTAG 
1651 ATATTTCAAA CCCTTTTCTG CCCACACATT CTTAAAAATA ATATACTGAG 
1701 AAATCTATAT ACAGGTTTTT TTTTAATTAG CTTGGAAAAG AGCAGTTGTA 
1751 TTCTGTTTGA ACAGCTGCTA ATGTCAATTC CTGTGGGAAG AAAGACCAAA 
1801 GAACATGGAG TTACACCAAG AATTTTAAAA CAAAGACGCT GTCCCTTTCC 
1851 TGAGCACCGT GCAGCCAAGA CTGAGAGATC AGTCTGAGAC CTGTGATTAA 
1901 GGAGTGTTTT CTACATAGCG TATAATTATG GAGCCACACA AGTGGGCCAT 
1951 TACTCTGTTG AGTGCTTCAT GTTTGAGGTA TTTTCGTGTT CCAACTTACA 
2001 TTAAAGTGTT TATAAAACAG GAAAAATCCA CGAGCAGGTA TTGACACTAT 
2051 CCATATTAGA TCATCACAAA ATTATATATA TAGCAGAGTC ATAAACAATG 
2101 AGAAACGGTC TTCCCACACT TGCTTTAAAT GGCCATGACC TAGTGTTTAG 
2151 GGAAAGCAGT AAAATCAGCG AGGAGCTCGT GGGAAAAATG AGACGGGCCC 
2201 TGAGGGGGTG ACTCATGGGC CAAGCAGGGC CACACAGGTA CCAGGCCGCC 
2251 ACGTCCTCTC CTGCCTCTCA CTCTCTGGAG ACTGGACTTC CTTTACTGCC 
2301 TCCTTTCTGA CATTTCCTAG ACATCAGACT TTGCTACTTA GTACACAAAC 
2351 GGGGTTCCCT TTTAAATTTG TTCACTCTAG TTAGCATTTG CAGAAGCTGT 
2401 GAAAAATTAC AGAGAGATGA TGTGTTGGGT AAGAGATGGT TTAAAAGTCC 



189 



WO 01/12659 



24 51 AGCTTGCTGT TTTTCATTAA 

2501 GGAGGGGAAC AATCATATAA 

2551 GATAGTGTTT AGCAGCTCAT 

2601 CAGGCGCTGA TGAGAAGTGT 

2651 CTTGCCTTAT GTTCCTTTCT 

2701 TGATTATATT GCACTCCTTG 

2751 CACATCCTGA TAGCTGAGCT 

2801 AATTGTTCTG GCTAATTTAG 

2851 TGTCCCTGAA CAAATCTTAT 

2901 ACATAGGCAC AACACTTTTA 

2951 AGAACTTTTT TCTAAATAAG 

3001 TGTTTAGGTA TTTTACATGG 

3051 CTGAGAAGTC TTGTTCCCAC 

3101 TTTATTCACC TCTAGCTTGT 

3151 AAGTGAATAG ATAAGCATTT 

3201 ATGTTTTGCC CCTGGCTTTT 

3251 CCTGCTTCAT TTTTTTAGAT 

3301 GTGATTAACT CGTGCACTGT 

3351 CACTGATATA TACAGCGCTG 

3401 GTCCATGTGC AGGTGTGTCT 

3451 AGGTTAAATG TATTTATAGG 

3501 TGATATTTGC GTGCTTTTTT 

3551 AAAAAAAAAA AAAA 



GTGTCTTGAA AATGAGTAAG TGGCGTTCCT 
TTCCGCAGGG TGGGTCTAAA CTTGTTTTCT 
GGCTCTGAGG GCACCTGATA ACACAGCAGC 
GTGCCAGACA GACCCGAGTG TGGCTTGGCT 
CTGTTCAGAG AAGCGTGAGA TGAGATTTTG 
GGCTGACTTT CCCATGCACA GAATGTTTTA 
GAAAATGCAA AGAGAAGGGA AAATGCCTTA 
AAGCAGCAGG CCTTGGAAGT CTTTGTCCTG 
GGGAGCTCTG GTACCTATGC CAGAAAATGC 
CATACACGTT CACACACCCC ACCCTTATGG 
AGAAAGAAAA ATTTTAAGAC TTACAAGTTA 
TTCAGAAAAC AAGACATGAA GCGGTATAAA 
AACCCCACGT GCCAGGTACA CATAACCATT 
GCTTCCAATG TTTGTTAGGC ATATGTAAAT 
CTCCCTCCTT TTGCTGACAT GAGTGGTGGC 
ATCCCTTGAC CCCATTCCAG TACCTAGAGA 
GTGTAATACT TCATGTGTGC GTGTGCCTTA 
GCAGGGACAT CGGGCTGGGA TCAGTTTGTT 
CGGGAGATAC CCTCACATGT GTATCATTTG 
GGAAGATAGA ATTCTAGGCG TAGAATTGAT 
GAAAAAATCA ATATAAAACT TTGCGTGTAA 
TTTTAATTTT TTTACCCAAA TAGTAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 656 bp to 1072 bp; peptide length: 139 
Category: similarity to known protein 



1 MSPSPPMAQM RHSQSLQMME EKTPGCQVCP LSGTPSPSLT ARVPSQPQHG 
51 GYAGAVSLLR YNQLPETTSP LQPLSKVPGQ RSPSLAHPGQ LTEGCPPWRG 
101 ASPLPTGPRP CPGFSPGQSR QDGEVPCQPV LWWGSCSLK 

BLASTP hits 

Entry CEEGAP7_1 from database TREMBL : 

gene: W EGAP7.1 W ; Caenorhabditis elegans cosmid EGAP7 . 

Score = 123, P » 2.3e-07, identities = 35/103, positives = 44/103 

Entry MMBPC35_1 from database TREMBL: 

Mouse carbohydrate binding protein 35 mRNA, 3' end. 

Score = 113, P = 2.2e-06, identities - 40/103, positives = 44/103 

Entry A28651 from database PIR: 

galactose-specif ic lectin - mouse >TREMBL:MMMAC2A_1 Mouse mRNA for 
Mac-2 antigen 

Score = 113, P = 2.2e-06, identities - 40/103, positives - 44/103 



Alert BLASTP hits for DKFZphfbr2_23o24, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_23o24 , frame 2 

Report for DKFZphfbr2_23o24 . 2 



| LENGTH J 

(MW) 

[pIJ 

[PROSITEJ 



139 

14748.91 
8.90 

PRENYLATION 1 
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[PROSITE) MYRISTYL 1 

(PROSITEJ CK2_PHOSPHO_SITE 1 

[ PROSITE] PROKAR_LIPOPROTEIN 1 

(PROSITE) PKC PHOSPHO SITE 1 

[KW] All~Alpha 



SEQ 
PRD 



MSPSPPMAQMRHSQSLQMMEEKTPGCQVCPLSGTPSPSLTARVPSQPQHGGYAGAVSLLR 
cccchhhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccchhhhhhhh 



SEQ 
PRD 



YNQLPETTSPLQPLSKVPGQRSPSLAHPGQLTEGCPPWRGASPLPTGPRPCPGFSPGQSR 
hhcccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 



SEQ 
PRD 



QDGEVPCQPVLWWGSCSLK 
ccccccccccccccccccc 



Prosite for DKFZphfbr2_23o24 . 2 

PS00005 40->43 PKC_PHOSPHO SITE PDOC00005 

PS00006 119->123 CK2 PHOSPHORITE PDOC00006 

PS00008 50->56 MYRISTYL * PDOC00008 

PS00013 126->137 PROKAR_LIPOPROTEIN PDOC00013 

PS00294 136->140 PRENYLATION PDOC00266 



(No Pfam data available for DKFZphfbr2_23o24 .2) 
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DKFZphfbr2_23o5 
group: brain derived 

DKFZphfbr2_23o5 encodes a novel 360 amino acid protein with no known similarity 

No informative BLAST results; no predictive prosite, pfam or SCOP motif e 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

unknown 

potential start at Bp 24 matchs Kozak consensus ANNatgG 
Sequenced by AGOWA 
Locus: /map="7q21-q22" 
Insert length: 1736 bp 

Poly A stretch at pos . 1714, polyadenylation signal at pos. 1680 

1 GGGGGAGGAT CAAAGTAGGC AAGATGGCGT CGAGCGGCGG GGAGCCAGGG 
51 AGTTTATTTG ATCACCACGT CCAGAGGGCG GTATGCGACA CACGGGCCAA 

101 ATATCGAGAG GGACGACGGC CTCGTGCTGT GAAGGTATAT ACAATCAATT 

151 TGGAATCTCA GTACTTATTA ATACAAGGAG TTCCTGCTGT GGGAGTCATG 

201 AAGGAATTAG TTGAGCGATT CGCTTTATAT GGTGCAATTG AACAGTACAA 

251 TGCTCTAGAT GAATACCCAG CAGAAGACTT TACTGAAGTT TATCTTATTA 

301 AATTTATGAA CTTACAAAGT GCAAGGACAG CCAAGAGAAA AATGGATGAA 

351 CAGAGTTTCT TCGGTGGATT GCTTCATGTG TGCTATGCTC CAGAATTTGA 

401 AACAGTTGAA GAAACTAGAA AAAAACTACA AATGCGGAAG GCATATGTAG 

451 TAAAAACTAC TGAAAATAAA GACCATTACG TGACAAAGAA GAAATTGGTT 

501 ACAGAGCATA AAGACACAGA GGATTTTAGA CAAGACTTCC ACTCAGAGAT 

551 GTCTGGATTT TGTAAAGCTG CTTTGAACAC TTCTGCAGGG AACTCAAATC 

601 CTTATCTTCC GTATTCCTGT GAATTGCCTT TATGTTATTT CTCCTCAAAA 

651 TGTATGTGTT CATCCGGGGG ACCTGTAGAC AGAGCACCAG ACTCCTCTAA 

701 GGATGGTAGA AACCATCATA AAACAATGGG GCATTATAAC CACAATGACT 

751 CTTTGCGGAA AACACAGATA AACTCTTTGA AAAACTCAGT GGCCTGCCCT 

801 GGTGCACAAA AGGCTATTAC GTCTTCAGAG GCAGTTGACA GATTTATGCC 

851 TAGGACAACA CAACTGCAGG AGCGCAAAAG AAGAAGAGAA GATGATCGTA 

901 AACTTGGAAC TTTTCTTCAA ACAAACCCAA CTGGTAATGA GATTATGATT 

951 GGACCTCTGT TACCAGACAT CTCTAAAGTG GATATGCACG ATGACTCATT 
1001 GAATACAACG GCGAATTTAA TTCGGCATAA ACTTAAAGAG GTATTTCATC 
1051 TGTGCCAAAG CCTCCAGAGG ACAAGCCAGA AGATGTACAT ACAAGTCATC 
1101 CATTAAAACA AAGAAGAAGA ATATAGAGTG CCAGCAGCAA CTTAGTATTT 
1151 TCTAAAAAGA ACATTTATTA TTTATTTTTA GCCTGTCATT TTAATTCTTC 
1201 AAGAGATTTT ACTGCTGGTA TTTTTTGATG CACTCCTCTT TGTAATTTCA 
1251 TTCAAGCCAT TTGTCTAAAG TCATTTCTTT GTTTTTTGGG AGATGGAGTC 
1301 TTGCTCTGTT GCCCAGGCTG GAATGCAGTG GCGTGATCTC GGCTCACTGC 
1351 AACCTCCACC TCCCGGGTTC AAGCGATTCT CCTGCCTCAG CCTCCTGAGT 
1401 ATCTGGGATT ACAGGCGTGC ACCACCATGC CTGGCTAAGT TTTGTGTTTT 
1451 TTTTAGTAGA GATGGGTTTT CACCATATTG GTCAGGCTGG TCTCGAACTC 
1501 CTGACCTTGT GATACACCTG CCTCAGCCTC CCAAAGGGAT GAGCCACCGC 
1551 GCCTGGCCCA TTTCTTCTTT TTTTGACCCA TACTTAATGT TGCAGAAACT 
1601 ATTCTTGTCA TAACATTATC TCTCATGTAC AGTAATTATA TGTAAATTAA 
1651 TTGAAGCAAA TATGGAAACT TTACAATAGA AATAAAGATA GGCAGCCAGC 
1701 GTCTGTTTCC AATTATAAAA AAAAAAAAAA AAAAAA 



BLAST Results 



Entry AC005156 from database EMBL: 

Homo sapiens PAC clone DJ1099C19 from 7q21-q22, complete sequence. 
Score = 2897, P = 2.4e-154, identities = 583/586 
2 exons covering Bp 465-1723 



Medline entries 



No Medline entry 



Peptide information for frame 3 
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ORF from 24 bp to 1103 bp; peptide length: 360 
Category: similarity to unknown protein 



1 MASSGGEPGS LFDHHVQRAV CDTRAKYREG RRPRAVKVYT INLESQYLLI 
51 QGVPAVGVMK ELVERFALYG AIEQYNALDE YPAEDFTEVY LIKFMNLQSA 
101 RTAKRKMDEQ SFFGGLLHVC YAPEFETVEE TRKKLQMRKA YVVKTTENKD 
151 HYVTKKKLVT EHKDTEDFRQ DFHSEMSGFC KAALNTSAGN SNPYLPYSCE 
201 LPLCYFSSKC MCSSGGPVDR APDSSKDGRN HHKTMGHYNH NDSLRKTQIN 
251 SLKNSVACPG AQKAITSSEA VDRFMPRTTQ LQERKRRRED DRKLGTFLQT 
301 NPTGNEIMIG PLLPDISKVD MHDDSLNTTA NLIRHKLKEV FHLCQSLQRT 
351 SQKMYIQVIH 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphfbr2_23o5, frame 3 

TREMBL:AC005824_10 gene: "F15K20.11"; Arabidopsis thaliana chromosome 
II BAC F15K20 genomic sequence, complete sequence., N - 2, Score = 114, 
P « 3.6e-ll 



>TREMBL:AC005824_10 gene: "F15K20 . 11"; Arabidopsis thaliana chromosome II 
BAC F15K20 genomic sequence, complete sequence. 
Length - 227 



HSPs: 



Score = 114 (17.1 bits), Expect = 3.6e-ll, Sum P{2) = 3.6e-ll 
Identities - 21/41 (51%), Positives - 29/41 (70%) 



Query: 


103 


AKRKMDEQSFFGGLLHVC YAPEFETVEETRKKLQMRKAYW 143 








AKRK+DE SF G L + YAPE+E V +T+ KL+ R+ V+ 




Sbjct: 


51 


AKRKLDESSFLGNRLQISYAPEYENVNDTKDKLESRRKEVL 91 




Score 


- 107 


(16.1 bits), Expect = 2.6e-10, Sum P(2J « 2.6e-10 




identities = 


= 50/191 (26%), Positives = 83/191 (43%) 




Query: 


103 


AKRKMDEQSFFGGLLHVCYAPEFETVEETRKKLQMRKAYWKTTENKDHYVTKKKLVTEH 


162 






AKRK+DE SF G L + YAPE+E V +T+ KL+ R+ V+ + T + VT+ 




Sbjct: 


51 


AKRKLDESSFLGNRLQISYAPEYENVNDTKDKLESRRKEVLARLNPQKEKSTSQ— VTKL 


108 


Query: 


163 


KDTEDFRQDFHSEMSGFCKAALNTSAGNSNPYLPYSCELPLCYFSSKCMCSSGGPVDRAP 


222 






+ D S + + GN+ P S + YF+S M + V 




Sbjct: 


109 


AGPALTQTDNVSSQRREMEYQFHR— GNA-PVTRVSSDQE— YFASSSMNQTVKTV 


159 


Query: 


223 


DSSKDGRNHHKTMGHYNHNDSLRKTQINSLKNSVACPGAQKAITSSEAVDRFMPRTTQLQ 


282 






K++++H+++N+ P+Q S RP ++Q+Q 




Sbjct: 


160 


-REKLNKTREENISSLSHCKQIEESG-NQKRLQ PSSQTQPEESGNQKRLQP-SSQIQ 


213 


Query: 


283 


-ERKRRREDDRK 293 








+ KR R D+R+ 




Sbjct: 


214 


PDLKRTRVDNRR 225 





Score = 102 (15.3 bits). Expect * 3.6e-ll, Sum P(2) = 3.6e-ll 
Identities = 22/55 (40%), Positives = 38/55 (69%) 



Query: 26 KYREGRRPRAVKVYTINLESQYLLIQGVPAVGVMKELVERFALYGAIEQY— NALDE 80 

+Y++ P AV+VYT+ ES+Y++++ VPA+G +L+ F YG +E++ LDE 
Sbjct: 3 RYKD-ETP-AVRVYTVCDESRYMIVRNVPALGCGDDLMRLFMTYGEVEEFAKRKLDE 57 



Pedant information for DKFZphfbr2_23o5, frame 3 



Report for DKFZphf br2_23o5 . 3 



[LENGTH) 360 

[MW] 41105.85 

[pi] 8.89 

(HOMOL) TREMBL:AC005824_10 gene: "F15K20 . 11"; Arabidopsis thaliana chromosome II BAC 
F15K20 genomic sequence, complete sequence. 5e-12 

[PROSITE] AMIDATION 1 

[PROSITE] MYRISTYL 2 

(PROSITE] CK2_PHOSPHO_SITE 7 
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[PROSITEJ PKC_PHOSPHO_SITE 9 

[PROSITE) ASN_GLYCOSYLATION 3 

[KWJ Alpha_Beta 

[KWJ LOW COMPLEXITY 4.17 < 



SEQ MASSGGEPGSLFDHHVQRAVCDTRAKYREGRRPRAVKVYTINLESQYLLIQGVPAVGVMK 

SEG 

PRO ccccccccceeeecceeeeehhhhhhhhhccccceeeeeeecccceeeeeeccccchhhh 

SEQ ELVERFALYGAIEQYNALDEYPAEDFTEVYLIKFMNLQSARTAKRKMDEQSFFGGLLHVC 

SEG 

PRD hhhhhhhhhhhhhhhhhhccccccceeeeeeehhhhhhhhhhhhhhhhhccccccceeee 

SEQ YAPEFETVEETRKKLQMRKAYVVKTTENKDHYVTKKKLVTEHKDTEDFRQDFHSEMSGFC 

SEG 

PRD eccchhhhhhhhhhhhhhhhheeeeccccceeeeeeeeeeeccccchhhhhhhhhcccce 

SEQ KAALNTSAGNSNPYLPYSCELPLCYFSSKCMCSSGGPVDRAPDSSKDGRNHHKTMGHYNH 

SEG 

PRD eeeeccccccccccccccccccceeecccccccccccccccccccccccccccccccccc 

SEQ NDSLRKTQINSLKNSVACPGAQKAITSSEAVDRFMPRTTQLQERKRRREDDRKLGTFLQT 

SEG xxxxxxxxxxxxxxx 

PRD cccceeeeccccccccccccceeeeecceeeeeccccchhhhhhhhhhhhccceeeeeec 

SEQ NPTGNEIMIGPLLPDISKVDMHDDSLNTTANLIRHKLKEVFHLCQSLQRTSQKMYIQVIH 

SEG 

PRD cccccceeeecccccccccccccccccchhhhhhhhhhhhhhhhhhhhhcchhhhhhccc 



Prosite for DKFZphfbr2_23o5 . 3 



PS00001 
PS00001 
PS00001 
PS00005 
PSO0005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PS00009 



185->189 
241->245 
327->331 
99->102 
102->105 
131->134 
154->157 
207->210 
224->227 
243->246 
251->254 
351->354 

4->e 

10->14 
127->131 
224->228 
266->270 
3O3->307 
317->321 
5->ll 
260->266 

29->33 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GL YCOS YLAT I ON 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

AMI DAT ION 



PDOC0O001 
PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00009 



(No Pfam data available for DKFZphfbr2_23o5 . 3) 
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DKFZphfbr2_2a2 



group: brain derived 

DKFZphfbr2_2a2.3 encodes a novel 167 amino acid protein with weak similarity to human 52K 
autoantigen Ro/SS-A 

The novel protein contains a C3HC4 Zinc finger "RING finger" motive. 

This domain is probably involved in mediating protein-protein interactions. 

Proteins containing a RING-finger are: mammalian V(D)J recombination activating protein 

(RAG1) , mouse rpt-1, human rfp, human 52 Kd Ro/SS-A protein and others. 

No informative BLAST results; no predictive prosite, pfam or SCOP motif e 

The new protein can find application in studying the expression profile of brain-specific 
genes. 



similarity to 52K autoantigen Ro/SS-A - human 

complete cDNA, complete cds, few EST hits 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1376 bp 

Poly A stretch at pos. 1355, polyadenylatioi? signal at pos. 1340 



1 GGGGACTCCA AATTAGAAAG GGGACGTCTA GTGGGTTGCC CGGGAGGGGT 

51 GGCGGGAGCG GTCCTGGAAA TAATCTGTCC TCTGTCGCCG GGAACTGGCG 

101 AGGTAGTTCC TTCGCGGTGG AGAGACCTGG AATGGCCAAA TATCAAGGTG 

151 AAGTTCAAAG TTTGAAACTG GATGATGATT CAGTTATAGA AGGAGTAAGC 

201 GACCAAGTAC TTGTGGCAGT TGTGGTCAGT TTCGCTTTGA TTGCTACCCT 

251 GGTATATGCA CTTTTCAGAA ATGTACATCA AAACATTCAC CCAGAAAACC 

301 AGGAGCTAGT AAGGGTACTT CGAGAACAGC TTCAAACAGA ACAGGATGCA 

351 CCTGCTGCCA CTCGACAGCA GTTCTACACT GACATGTACT GTCCCATCTG 

401 CCTGCACCAA GCCTCCTTCC CGGTGGAGAC CAACTGTGGA CATCTTTTTT 

451 GTGGTGCCTG CATTATTGCT TACTGGCGAT ATGGTTCATG GCTTGGGGCA 

501 ATCAGTTGTC CAATCTGTAG ACAAACGGTA ACCTTACTCC TAACAGTATT 

551 TGGTGAAGAT GATCAGTCTC AGGATGTTCT GAGATTGCAT CAGGATATTA 

601 ATGATTATAA CCGGAGATTC TCAGGGCAAC CCTGATCTAT TATGGAGAGA 

651 ATTATGGATC TACCCACTTT ACTGAGGCAT GCATTCAGGG AAATGTTTTC 

701 AGTCGGGGGC CTTTTCTGGA TGTTTCGCAT CAGGATAATA CTTTGTTTAA 

751 TGGGAGCTTT TTTCTATCTT ATATCACCTC TAGATTTTGT ACCTGAAGCC 

801 TTGTTTGGAA TTCTAGGCTT TCTAGATGAT TTCTTTGTCA TCTTTTTATT 

851 GCTTATCTAC ATCTCTATTA TGTATCGAGA AGTGATAACC CAAAGGCTAA 

901 CTAGATGAAA AAGGAAACAA AACTGAGTTT ACTAGGATAT CTGAGCTAAT 

951 GTAGAACATC AAACAGAAGG ACCCATGGCA GTATAAAGCA ATGAAGCAAT 

1001 GGAGTATTAT CTCACAAATA TAAAACCACT ATAAGACAAA CATTTGATTA 

1051 TCATTTGACA AATACCTAGG TATAACTGGA ATTTTCATGT TTGAAGTTCT 

1101 AATATTAAGT TTAGAATTAT AATGATCTAC AGTTGTATCT TGATTCTATG 

1151 TTGTCTGGAA AAAATATGGA ATTATATAAA AAGGGATGCT TTTATATATT 

1201 TTTCTTTTCC CCAGAATTAC TTAGATTAAT TAGATGTATA GTAAAATATT 

1251 GTTAAATGTC AGTTTATCCA TCTTATCCTT CTCAGCAGGT ACCTATATGA 

1301 TAATATATAG CTGTGAAACT CATCTAAATA TTTTTGTTCC AATAAAATAT 

1351 TAT AT ACT AA AAAAAAAAAA AAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 132 bp to 632 bp; peptide length: 167 
Category: similarity to known protein 
Classification: unset 



195 



WO 01/12659 



PCT/ffiOO/01496 



Prosite motifs: ZINC_FINGER_C3HC4 (102-112) 



1 MAKYQGEVQS LKLDDDSVIE GVSDQVLVAV WSFALIATL VYALFRNVHQ 
51 NIHPENQELV RVLREQLQTE QDAPAATRQQ FYTDMYCPIC LHQASFPVET 
101 NCGHLFCGAC IIAYWRYGSW LGAISCPICR QTVTLLLTVF GEDDQSQDVL 
151 RLHQDINDYN RRFSGQP 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2a2, frame 3 

TREMBL :CEY38FlA_8 gene: "Y38F1A.2"; Caenorhabditis elegans cosraid 
Y38F1A, N = 1, Score = 194, P = 2e-15 

PIR:T05222 hypothetical protein F17I5.130 - Arabidopsis thaliana, N « 
1, Score « 159, p = 1.4e-10 

TREMBLNEW:AB025011_1 gene: '*TRI F" ; product: "Trif-d"; Mus musculus 
mRNA for Trif-d, complete cds., N = 1, Score = 108, P = 2.6e-06 

PIR:A37241 52K autoantigen Ro/SS-A - human, N = 1, Score » 115, P - 
5e-05 



>TREMBL:CEY38F1A_8 gene: n Y38FlA.2 w ; Caenorhabditis elegans cosmid Y38F1A 
Length =* 283 

HSPs: 

Score = 194 (29.1 bits), Expect = 2.0e-15, P = 2.0e-15 
Identities = 52/149 (34%), Positives = "78/149 (52%) 



Query: 16 DSVIEGVSDQVLVAVVVSFALIATLVYALFRNVHQNIHPENQELVRVLREQLQTEQDAPA 75 

D +E ++ Q+ +A+ VF+++A Q E RQ+ T++ 

Sbjct: 41 DPDVE-LATQITMAIAVIF-IVKAIFDAWQSRRRQRAASRMDENAE— RNQIITQRRISE 96 

Query: 76 ATRQQFYTDMYCPICLHQASFPVETNCGHLFCGACIIAYWRYGSWLGA-ISCPICRQTVT 134 

A Q + CPICL ASFPV T+CGH+FC CII YW+ + C +CR T 

Sbjct: 97 ALHQSSHE C P I CLANAS FPVLT DCGH I FCC EC 1 1 QYWQQSKA I VT PC DC AMC RSTFY 153 

Query: 135 LLLTV FGEDDQSQDVLRLHQ-DINDYNRRFS 164 

+LL V G +++ D ++ + I+DYNRRFS 

Sbjct: 154 MLLPVHWPTMGTSEETDDHIQENNIRIDDYNRRFS 188 

Pedant information for DKFZphfbr2_2a2, frame 3 



Report for DKFZphfbr2_2a2 . 3 

[LENGTH J 167 

[MW] 18941.65 

[pi] 4.91 

[HOMOLJ TREMBL: CEY 38 F1A_8 gene: "Y3BF1A.2"; Caenorhabditis elegans cosmid Y38F1A le- 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YDR2 65w] le-04 

[FUNCAT] 30.19 peroxisomal organization [S. cerevisiae, YDR265wj le-04 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YLR323c) 2e-04 

[BLOCKS) BL00518 Zinc finger, C3HC4 type, proteins 

[PROSITE) ZINC_FINGER_C3HC4 1 

[PFAM) Zinc finger, C3HC4 type (RING finger) 

[KW] Irregular 

[KW] 3D 

[KW] LOW_COMPLEXITY 6.59 % 

SEQ MAKYQGEVQSLKLDDDSVIEGVSDQVLVAVVVSFALIATLVYALFRNVHQNIHPENQELV 

SEG xxxxxxxxxxx 

lrmd- 

SEQ RVLREQLQTEQDAPAATRQQFYTDMYCPICLHQASFPVETNCGHLFCGAC IIAYWRYGSW 

SEG 

lrrad- HHHHHHBTTTTTEETTTEEEETTTEEEEHHHHH HHHHH 



SEQ LGAISCPICRQTVTLLLTVFGEDDQSQDVLRLHQDINDYNRRFSGQP 
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SEG 

lrmd- HCCB-TTTTT 



PS0051B 



Prosite for DKFZphfbr2_2a2 . 3 
102->112 ZINC FINGER C3HC4 PDOC00449 



Pfam for DKFZphfbr2_2a2 .3 



HMM_NAME 

HMM 

Query 

HMM 

Query 



Zinc finger, C3HC4 type (RING finger) 

*CPICFcTFQlDyPWPFdePmMlPCgH3FCypCIrrW CP 

CPIC L+ P++++CGH+FC +CI+ + CP 

87 CPIC LHQ ASFPVETNCGHLFCGACI IAYWRYGSWLGAI SCP 



mC* 
+C 

128 IC 



129 



127 
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DKFZphfbr2_2bl7 



group: transmembrane protein 

DKFZphfbr2_2bl7 encodes a novel 285 amino acid protein with similarity to D. melanogaster 30K 
protein. 

The protein contains 3 transmembrane regions. 

No informative BLAST results; no predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



similarity to Drosophila hypothetical 30K protein 

complete cDNA, complete cds, EST hits 
TRANSMEMBRANE 3 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 1426 bp 

Poly A stretch at pos. 1345, polyadenylation signal at pos. 1330 



1 GGGGGTATTT CCAAGGACTC CAAAGCGAGG CCGGGGACTG AAGGTGTGGG 
51 TGTCGAGCCC TCTGGCAGAG GGTTAACCTG GGTCAAATGC ACGGATTCTC 
101 ACCTCGTACA GTTACGCTCT CCCGCGGCAC GTCCGCGAGG ACTTGAAGTC 
151 CTGAGCGCTC AAGTTTGTCC GTAGGTCGAG AGAAGGCCAT GGAGGTGCCG 
201 CCACCGGCAC CGCGGAGCTT TCTCTGTAGA GCATTGTGCC TATTTCCCCG 
251 AGTCTTTGCT GCCGAAGCTG TGACTGCCGA TTCGGAAGTC CTTGAGGAGC 
301 GTCAGAAGCG GCTTCCCTAC GTCCCAGAGC CCTATTACCC GGAATCTGGA 
351 TGGGACCGCC TCCGGGAGCT GTTTGGCAAA GATGAACAGC AGAGAATTTC 
401 AAAGGACCTT GCTAATATCT GTAAGACGGC GGCTACAGCA GGCATCATTG 
451 GCTGGGTGTA TGGGGGAATA CCAGCTTTTA TTCATGCTAA ACAACAATAC 
501 ATTGAGCAGA GCCAGGCAGA AATTTATCAT AACCGGTTTG ATGCTGTGCA 
551 ATCTGCACAT CGTGCTGCCA CACGAGGCTT CATTCGTTAT GGCTGGCGCT 
601 GGGGTTGGAG AACTGCAGTG TTTGTGACTA TATTCAACAC AGTGAACACT 
651 AGTCTGAATG TATACCGAAA TAAAGATGCC TTAAGCCATT TTGTAATTGC 
701 AGGAGCTGTC ACGGGAAGTC TTTTTAGGAT AAACGTAGGC CTGCGTGGCC 
751 TGGTGGCTGG TGGCATAATT GGAGCCTTGC TGGGCACTCC TGTAGGAGGC 
801 CTGCTGATGG CATTTCAGAA GTACTCTGGT GAGACTGTTC AGGAAAGAAA 
851 ACAGAAGGAT CGAAAGGCAC TCCATGAGCT AAAACTGGAA GAGTGGAAAG 
901 GCAGACTACA AGTTACTGAG CACCTCCCTG AGAAAATTGA AAGTAGTTTA 
951 CAGGAAGATG AACCTGAGAA TGATGCTAAG AAAATTGAAG CACTGCTAAA 
1001 CCTTCCTAGA AACCCTTCAG TAATAGATAA ACAAGACAAG GACTGAAAGT 
1051 GCTCTGAACT TGAAACTCAC TGGAGAGCTG AAGGGAGCTG CCATGTCCGA 
1101 TGAATGCCAA CAGACAGGCC ACTCTTTGGT CAGCCTGCTG ACAAATTTAA 
1151 GTGCTGGTAC CTGTGGTGGC AGTGGCTTGC TCTTGTCTTT TTCTTTTCTT 
1201 TTTAACTAAG AATGGGGCTG TTGTACTCTC ACTTTACTTA TCCTTAAATT 
1251 TAAATACATA CTTATGTTTG TATTAATCTA TCAATATATG CATACATGAA 
1301 TATATCCACC CACCTAGATT TTAAGCAGTA AATAAAACAT TTCGCAAAAG 
1351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1401 AAAAAAAAAA AAAAAAAAAA AAAAAA 



BLAST Results 



Entry HSG19630 from database EMBL: 
human STS A001T27. 
Score - 961, P = 1.2e-36, identities = 193/194 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 189 bp to 1043 bp; peptide length: 285 
Category: similarity to unknown protein 
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1 MEVPPPAPRS FLCRALCLFP RVFAAEAVTA DSEVLEERQK RLPYVPEPYY 
51 PESGWDRLRE LFGKDEQQRI SKDLANICKT AATAGIIGWV YGGIPAFIHA 
101 KQQYIEQSQA EI YHNRFDAV QSAHRAATRG FIRYGWRWGW RTAVFVTIFN 
151 TVNTSLNVYR NKDALSHFVI AGAVTGSLFR INVGLRGLVA GGIIGALLGT 
201 PVGGLLMAFQ KYSGETVQER KQKDRKALHE LKLEEWKGRL QVTEHLPEKI 
251 ESSLQEDEPE NDAKKIEALL NLPRNPSVID KQDKD 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2bl7, frame 3 

PIR:JQ1024 hypothetical 30K protein (DmRPl40 5' region) - fruit fly 
(Drosophila melanogaster), N = 1, Score = 312, P = 6.1e-28 



>PIR:JQ1024 hypothetical 30K protein (DmRP140 5' region) - fruit fly 
(Drosophila melanogaster > 
Length =261 

HSPs: 



Score - 312 (46.8 bits), Expect - 6.1e-28, P = 6.1e-28 
Identities = 68/231 (29%), Positives « 125/231 (54%) 



Query: 


30 


ADSEVLEERQKRLPYVPEPYYPESGWDRLRELFGKDEQQRISKDLANICKTAATAGIIGW 


89 






AD V +E + ++ E+G +RL+++F DE I +L ++ + +IG 




Sbjct: 


23 


ADEIVDKENKTYKAFLASKPPEETGLERLKQMFTIDEFGSIFSELNSVYQAGFLGFLIGA 


82 


Query: 


90 


VYGGI PAFIHAKQQYI EQSQAEI YHNRFDAVQSAHRAATRGFIRYGWRWGWRTAVFVTI F 


149 






+YGG+ A ++E +QA + + FDA + T F + G++WGWR +F T + 




Sbjct: 


83 


I YGGVTQSRVAYMNFMENNQATAFKSHFDAKKKLQDQFTVNFAKGGFKWGWRVGLFTTSY 


142 


Query: 


150 


NTVNTSLNVYRNKDALSHFVIAGAVTGSLFRINVGLRGLVAGGIIGALLGTPVGGLLMAF 


209 






+ T ++VYR K ++ . ++ AG++TGSL+++++GLRG+ AGGIIG LG G + 




Sbjct: 


143 


FGIITCMSVYRGKSSIYEY L AAG SITGSLYKVS LG LRGMAAGG 1 1 GG FL GG VA G VT S L LL 


202 


Query: 


210 


QKYSGETVQERKQKDRKALHELKLEEWKGRLQVTEHLPEKI ESSLQEDEPE 260 








K SG +++E ++ ++K RL E++ + + +++ PE 




Sbjct: 


203 


MKASGTSMEE VRYWQYKWRLDRDEN IQQAFKKLTEDEN PE 242 





Pedant information for DKFZphfbr2_2bl7, frame 3 



Report for DKFZphfbr2_2bl7 . 3 



[ LENGTH ) 285 

[MW] 32177.88 

[pi] 8.65 

[HOMOL] PIR:JQ1024 hypothetical 30K protein (DmRP140 5' region) - fruit fly (Drosophila 

melanogaster) 7e-20 

(PROSITE) MYRISTYL 7 

[PROSITE] CK2_PHOSPHO_SITE 5 

[PROSITE] ASN_GLYCOSYLATION 1 

[KW] SIGNAL_PEPTIDE 25 

[KW] TRANSMEMBRANE 3 

[KW] LOW_COMPLEXITY 5.96 % 



SEQ MEVPPPAPRS FLCRALCLFPRVFAAEAVTADSEVLEERQKRLPYVPEPYYPESGWDRLRE 

SEG 

PRD cccccccceeeeeeeeeehhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhh 

MEM 

SEQ LFGKDEQQRI SKDLANICKTAATAGIIGWVYGGI PAFIHAKQQYI EQSQAEI YHNRFDAV 

SEG 

PRD hhcccchhhhhhhhhhhhhhhhcccceeeeccccchhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ QSAHRAATRGFIRYGWRWGWRTAVFVTI FNTVNTSLNVYRNKDALSHFV I AGAVTGSLFR 

SEG 

PRD hhhhhhhhhhhccccccccceeeeeeeeccccccceeecccccccceeeeecccccceee 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMM M 

SEQ INVGLRGLVAGGIIGALLGTPVGGLLMAFQKYSGETVQERKQKDRKALHELKLEEWKGRL 
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SEG . . xxxxxxxxxxxxxxxxx 

PRD eecccccccccceeeeeccccccchhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ QVTEHLPEKIESSLQEDEPENDAKKIEALLNLPRNPSVIDKQDKD 

SEG 

PRD ccccccccchhhhhccccccchhhhhhhhhhcccccceeeccccc 

MEM 



Prosite for DKFZphfbr2_2bl7 . 3 



PS00001 
PS00006 
PS00006 
PS00006 
PS00006 
PSO00O6 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 



153->157 
53->57 
108->H2 
216->220 
253->257 
277->281 
92->98 
172->178 
187->193 
191->197 
195->201 
199->205 
204->210 



ASN_GLYCOSYLATION 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphfbr2_2bl7 . 3) 
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DKFZphfbr2_2b5 



group: cell structure and motility 

DKFZph£br2_2b5 encodes a novel 957 amino acid protein with strong similarity to collagens. 

The novel protein contains the typical (xxG)n repeat of collagen proteins and a 

Pfam von Willebrand factor type A domain. Therefore, the protein seems to be a new collagen 

alpha chain. 

The new protein can find application in modulation of connective tissue, bone and cartilage 
development and maintainance . 



similarity to collagen proteins 

shows typical (xxG)n repeat of collagen proteins 
[PFAM] von Willebrand factor type A domain 



Sequenced by Qiagen 
Locus: /map= w 6" 
Insert length: 4160 bp 

Poly A stretch at pos. 4141, polyadenylation signal at pos. 4119 



1 GGGGGCCCGC TGCAGGGAGA 

51 TTTCAGCGCA GGTCTTGCTC 

101 CTGTCCCCCT GGCGCAACAC 

151 GGACACTGCG CCAGGAATCC 

201 ACATGGCTCA CTATATTACA 

251 CAGAATTCTG TGTTAGCTGA 

301 TGCTCCGACA GATTTAGTTT 

351 CAGAAAACTT TGAAATAGTG 

401 TTTGACATAG GGCCGAAGTT 

451 CTACCCTGTG CTGGAGATTC 

501 TGACGGCAGC AGTGGAATCC 

551 GGGAAGGCCA TCCAGTTTGC 

601 ATTTCTGACT AAGATAGCAG 

651 ACGTCAAGGA TGCAGCTCAA 

701 GCTATTGGTG TTGGTTCAGA 

751 CAACAAGCCT TCGTCTACTT 

801 TATCCAAAAT AAGGGAAGTG 

851 TGTCCAACAC GAATTCCAGT 

901 TCTTTTGGGT TTAGATGTAA 

951 CACCAAAAAA GATAAAAGGA 

1001 GAACTCACAA GCAATGTTTT 

1051 TGTGTCTACT CAAAGATTTA 

1101 TATTAACTAT TGATGGAAGG 

1151 GACAAAATCT TATTATTTAC 

1201 GGTTACCTTT GCTAACCCTC 

1251 ACCAAATTCG TCTCTTAGTA 

1301 GACCAACAAA TTGAAAACAA 

1351 CAATGGGCAA ACCCAAATTG 

1401 AGTTTGATGT CCAAAAGTTG 

1451 CGGGAGACAG CATGTGAGAT 

1501 TCCCAGTGAT GTAGGTTCAA 

1551 AACCAGGACT TCAAGGCCCC 

1601 GGCTACCCTG GACAACCTGG 

1651 TGCAGGGACA CCAGGTGTTC 

1701 GACTACCAGG TTACAAAGGA 

1751 CGTGGACTTC CTGGTTTTCC 

1801 TGAAATGGGT GCCAAAGGAG 

1851 AGGGTGCAAA AGGTGAAAAG 

1901 CCTGCTGGAG AACCAGGAAG 

1951 CGGTTTCAAG GGAGAAGCAG 

2001 CACGGGGAGA GCCTGGAATC 

2051 GGCCAAAAGG GAGAAATTGG 

2101 CCCAGGGATG CCTGGTTTAA 

2151 GAACACCGGG ATCTAAGGGA 

2201 CCTGGGGCTT CAGGGCTCAA 

2251 AGAACCAGGA TACATGGGTT 

2301 AAGGAAATCA AGGTGAAAAA 

2351 AGACAGGGAA TTCCAGGGCA 

2401 AGGAGAGAGA GGTGAAAAGG 

2451 CAAAAGGAGA ATCTGGGGTG 

2501 GGGCAACCTG GGGATCCAGG 

2551 GCCCGGAAGA GAGTTTTCAG 



ACGGACTCCG GGCGGAGGGC AGCCAATCCG 
GGGTTGGGCT TGCCACTGCC TGGAACATAC 
TCAGCTGGCT GCGACCGCAA CCCCGAGCCT 
TAAAACCAAA ATATTAGAAC GAAAACAGAA 
TTTCTCTGCA TGGTTTTGGT GCTGCTTCTT 
AGATGGGGAA GTAAGATCAA GTTGTCGTAC 
TCATCTTAGA TGGCTCTTAT AGTGTTGGCC 
AAAAAGTGGC TTGTCAATAT CACAAAAAAC 
TATTCAAGTT GGAGTGGTTC AATATAGTGA 
CTCTCGGAAG CTATGATTCA GGAGAACATT 
ATACTCTACT TAGGAGGAAA CACAAAGACA 
GCTCGATTAC CTTTTTGACA AGTCCTCACG 
TGGTACTTAC GGATGGCAAG TCCCAAGATG 
GCAGCAAGAG ATAGTAAGAT AACATTATTT 
AACAGAAGAT GCCGAACTTA GAGCTATTGC 
ATGTGTTTTA TGTGGAAGAC TATATTGCAA 
ATGAAGCAGA AACTTTGTGA AGAATCTGTC 
GGCAGCTCGT GATGAAAGGG GATTTGATAT 
ATAAAAAGGT TAAGAAAAGA ATACAGCTTT 
TATGAAGTAA CATCAAAAGT TGATTTATCA 
CCCAGAAGGT CTTCCTCCAT CATATGTATT 
AAGTCAAGAA AATTTGGGAT TTATGGAGAA 
CCACAAATAG CAGTTACCTT AAATGGTGTG 
AACAACCAGC GTAATTAATG GCTCACAAGT 
AAGTTAAGAC GTTGTTTGAT GAAGGCTGGC 
ACAGAACAAG ATGTGACTTT GTATATTGAT 
GCCCTTACAT CCAGTTTTAG GGATCTTGAT 
GAAAATATTC TGGAAAAGAA GAAACTGTTC 
CGAATCTACT GTGACCCAGA ACAGAACAAC 
TCCTGGATTT AATGGAGAGT GCCTTAATGG 
CTCCAGCTCC CTGTATTTGT CCTCCGGGAA 
AAAGGTGACC CTGGACTGCC TGGGAACCCT 
TCAAGATGGT AAGCCTGGAT ATCAGGGAAT 
CAGGATCTCC AGGAATACAA GGAGCTCGAG 
GAACCAGGGC GAGATGGTGA CAAGGGTGAT 
TGGGCTTCAT GGCATGCCAG GATCAAAGGG 
ACAAAGGATC ACCTGGATTT TATGGCAAAA 
GGGAATGCTG GCTTCCCTGG CCTCCCTGGA 
ACATGGAAAG GATGGATTAA TGGGTAGTCC 
GATCCCCTGG TGCTCCGGGG CAGGATGGAA 
CCAGGATTTC CTGGAAACCG AGGATTAATG 
GCCTCCAGGA CAGCAAGGAA AAAAAGGAGC 
TGGGAAGCAA TGGCTCACCA GGCCAGCCTG 
AGCAAAGGTG AACCTGGAAT TCAAGGGATG 
GGGAGAACCA GGAGCAACGG GTTCCCCAGG 
TACCCGGGAT TCAAGGAAAA AAGGGGGACA 
GGTATTCAGG GTCAAAAGGG AGAAAATGGA 
ACAGGGAATT CAAGGCCATC ATGGTGCAAA 
GAGAACCTGG TGTCCGAGGT GCCATTGGAT 
GATGGCTTGA TGGGGCCCGC AGGTCCTAAG 
TCCTCAGGGA CCCCCAGGTT TGGATGGGAA 
AACAATTTAT TCGACAAGTT TGCACAGATG 
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2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3601 
3851 
3901 
3951 
4001 
4051 
4101 
4151 



TAATAAGAGC 
TGTGATCATT 
TGGTCCGATA 
GAGATGGTGT 
GGATTAAAAG 
TGGGTATCCT 
CTCCTGGAAT 
AAAGATGGAG 
AGGCATCTGC 
CGTTCAGAAA 
TAGGCATGGT 
CAGTATCCCT 
TTCTTATGGA 
CTCAGTCATT 
TTTCTTGTAA 
CATTGCCTGT 
AGCCTCCATG 
CATGTTTCCT 
CTCATCATTG 
ATTGGTGAAC 
TGGATTGCCT 
AGTGTTTCTT 
AAGAAAGAGT 
ATTTTAGACA 
AGTACTAAAA 
ATGCCTTCAT 
TTTCATTGTA 
AAAAGGAAAA 
GTAAAATATT 
AAACAGAATT 
TGCTTGTTAT 
AAAAAAAAAA 



CCAGCTACCA 
GCCTGTCCCA 
GGCCCAGAGG 
TCCTGGATTA 
GCCTACCAGG 
GGAGAACAAG 
AAGCAAAGAA 
ACCATGGAAA 
GACCCATCAC 
AGGACCAAAC 
GCTTTTTCTG 
TGAAAAGAAA 
AAAAAATATA 
TGGAGCCCTT 
AGTCCATTTA 
TAGCCAGTCA 
CAGTAGAGAT 
ATCTCATAGC 
GAAGTAAGAT 
TACTCATTTA 
GTTGTTCGGT 
AATTCATTTC 
ATTAATTACT 
AAAAGTTTCA 
GACTATTTTA 
TTTCCATTTC 
GCAAAGCTAA 
CTCCTGAAAT 
ATGAACAGTC 
TGAAATATTT 
TCAGAGTATA 



GTCTTACTTC 
ACATGGCTCC 
GTCCCAGAGG 
GTGGGTGTCC 
AAGAAATGGG 
GTCCTCCTGG 
GGTCCTCCAG 
ACCTGGAATC 
TATGTTTTAG 
TATTAGTGTC 
TGGTCTTTTG 
CTTAAGTACC 
AAAGATCACA 
GGATTAGCAG 
TGTTAATCAA 
GTTTTAGTCA 
TTGAGTTTAA 
TCATGCTACT 
CAGGGCTGAT 
CTACAGTGTC 
GTTGTGAATA 
AAACTCTAAA 
TTGGGAATGG 
TTGTACATTC 
TACTTGTTGA 
ACTTATATGT 
TGGAAATAAA 
CCTAGAATGT 
TTTGTGTATT 
CATCCTTGTC 
ATAAAGTTTT 



AGAGTGGAAG 
CCGGGTATTC 
ATTACCTGGT 
CTGGACGTCC 
GAAAAAGGGA 
TCCCCCAGGT 
GAGACCCAGG 
CAAGGGCAAC 
TGTAATTGCC 
TGATGCCTCA 
CATCTCAGGA 
TCGGTGTTTT 
TATACTGATT 
CATTAATTAA 
AGTTGAATAT 
CTGTGAAATA 
TTTCATGTCC 
ACATAAGCCA 
ATTCACCTGG 
TCAGCCTTGA 
GCACCTCTGA 
ATTAGATTAA 
TCAAAATTAA 
AAAGAAAATG 
TTAATCGGAA 
GCATGTCCAT 
GCTAATGCTC 
CTTGTTATTT 
GTGCTTAATG 
ATGCTCAAAA 
GTACAGGCCT 



AATTAGAAAT 
CTGGGCCACC 
TTGCCAGGAA 
AGGTGTCAGA 
GCCAAGGGTT 
CCAGAGGGCC 
TCTCCCTGGC 
CAGGCCCCCC 
AGAAGAGATC 
TTCAGCAGCC 
AGATAACCAA 
TATTTTTTTT 
TTAAAGGCTC 
ATCTCAAGGG 
AAAAATCCAC 
TTTCACATTC 
ATGTGACTTT 
AAACATGTAT 
GATAGACAGT 
TAAAGGGCAG 
ATAAGATTAG 
TGGTGGTGCT 
CATTAAAAAC 
TAAGTTTGGA 
TGTTTGTTGT 
ATATGTTAAT 
TAGTTGAAAG 
TTAGCTGACT 
CTTTTGTAAG 
TTTTGTTACA 
GAAAAAAAAA 



BLAST Results 



Entry HS682J15 from database EMBLNEW: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 682 J15 
Score * 6240, P « 0.0e+00, identities = 1256/1263 
13 exons matching Bp 2015-4118 



Entry HS708F5 from database EMBLNEW: 
Human DNA sequence *** SEQUENCING IN PROGRESS *** 
Score => 2775, P - 1.0e-221, identities = 739/912 
10 exons matching Bp 5-1745 



from clone 708F5 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 203 bp to 3073 bp; peptide length: 957 
Category: similarity to known protein 



1 MAHYITFLCM 
51 ENFEIVKKWL 
101 TAAVESILYL 
151 VKDAAQAARD 
201 SKIREVMKQK 
251 PKKIKGYEVT 
301 LTIDGRPQIA 
351 QIRLLVTEQD 
401 FDVQKLRIYC 
451 PGLQGPKGDP 
501 LPGYKGEPGR 
551 GAKGEKGNAG 
601 RGEPGIPGFP 
651 TPGSKGSKGE 
701 GNQGEKGIQG 
751 KGESGVDGLM 
801 IRAQLPVLLQ 



VLVLLLQNSV 
VNITKNFDIG 
GGNTKTGKAI 
SKITLFAIGV 
LCEESVCPTR 
SKVDLSELTS 
VTLNGVDKIL 
VTLYIDDQQI 
DPEQNNRETA 
GLPGNPGYPG 
DGDKGDRGLP 
FPGLPGPAGE 
GNRGLMGQKG 
PGIQGMPGAS 
QKGENGRQGI 
GPAGPKGQPG 
SGRIRNCDHC 



LAEDGEVRSS 
PKFIQVGVVQ 
QFALDYLFDK 
GSETEDAELR 
IPVAARDERG 
NVFPEGLPPS 
LFTTTSVING 
ENKPLHPVLG 
CEIPGFNGEC 
QPGQDGKPGY 
GFPGLHGMPG 
PGRHGKDGLM 
EIGPPGQQGK 
GLKGEPGATG 
PGQQGIQGHH 
DPGPQGPPGL 
LSQHGSPGIP 



CRTAPTDLVF 
YSDYPVLEIP 
SSRFLTKIAV 
AIANKPSSTY 
FDILLGLDVN 
YVFVSTQRFK 
SQVVTFANPQ 
ILINGQTQIG 
LNGPSDVGST 
QGIAGTPGVP 
SKGEMGAKGD 
GSPGFKGEAG 
KGAPGMPGLM 
SPGEPGYMGL 
GAKGERGEKG 
DGKPGREFSE 
GPPGPIGPEG 



ILDGSYSVGP 
LGSYDSGEHL 
VLTDGKSQDD 
VFYVEDYIAI 
KKVKKRIQLS 
VKKIWDLWRI 
VKTLFDEGWH 
KYSGKEETVQ 
PAPCICPPGK 
GSPGIQGARG 
KGS PGFYGKK 
SPGAPGQDGT 
GSNGSPGQPG 
PGIQGKKGDK 
EPGVRGAIGS 
QFIRQVCTDV 
PRGLPGLPGR 



202 



WO 01/12659 



PCT/BB00/01496 



851 DGVPGLVGVP GRPGVRGLKG LPGRNGEKGS QGFGYPGEQG PPGPPGPEGP 
901 PGISKEGPPG DPGLPGKDGD HGKPGIQGQP GPPGICDPSL CFSVIARRDP 
951 FRKGPNY 

BLASTP hits 
Entry HSC0L7A1X_1 from database TREMBL: 

gene: "COL7A1"; product: "collagen type VII"; Homo sapiens (clones: 
CW52-2, CW27-6, CW15-2, CW26-5, 11-67) collagen type VII intergenic 
region and (COL7A1) gene, complete cds. 

Score = 949, P « 3.4e-122, identities = 237/553, positives = 281/553 
Entry CA17_HUMAN from database SWISSPROT: 

COLLAGEN ALPHA 1<VII> CHAIN PRECURSOR (LONG-CHAIN COLLAGEN) (LC 
COLLAGEN). >TREMBL: HSCOL7Al_l gene: "COL7A1"; product: "alpha-l type 
VII collagen"; Human alpha-1 type VII collagen (COL7A1) mRNA, complete 
cds. 

Score « 949, P » 3.6e-122, identities = 237/553, positives « 281/553 



Alert BLASTP hits for DKFZphfbr2_2b5, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_2b5, frame 2 



Report for DKFZphf br2_2b5 . 2 
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BLOCKS J 


BL00420A Speract receptor repeat proteins domain proteins 




SCOP ] 


dlzoob 3.45.1.1.1 Integrin CDlla/CDl8 (LFA-1) [Human (Horn 


2e-58 


SCOP] 


dlido_2 3.4 5.1.1.2 Integrin CR3 (CDllb/CD18) , alpha subunit 


[Huma 8e-62 


EC] 


3.1.1.7 Acetylcholinesterase 7e-24 




PIRKW] 


blocked amino end le-43 




PIRKW] 


duplication 7e-4 6 




PIRKW] 


cornea le-35 




PIRKW] 


lung 2e-40 




PIRKW] 


leukocyte le-42 




PIRKW] 


skin le-40 




PIRKW] 


transmembrane protein le-37 




PIRKW] 


cartilage 3e-59 




PIRKW] 


hydroxylysine 4e-62 




PIRKW] 


connective tissue 3e-43 




PIRKW) 


triple helix 5e-82 




PIRKW] 


homotrimer 2e-37 




PIRKW] 


bone 6e-40 




PIRKW] 


Alport syndrome le-42 




PIRKW] 


laminin binding 2e-40 




PIRKW] 


liver 2e-40 




PIRKW] 


glycoprotein 5e-82 




PIRKW] 


carboxylic ester hydrolase 7e-24 




PIRKW] 


disulfide bond 7e-46 




PIRKW] 


cell binding 7e-46 




PIRKW] 


heterotrimer 4e-62 




PIRKW] 


calcium binding 8e-28 




PIRKW] 


alternative splicing 5e-82 




PIRKW] 


coiled coil 5e-82 




PIRKW] 


basement membrane 7e-4 6 




PIRKW] 


trimer 5e-82 




PIRKW] 


pyroglutamic acid 3e-43 




PIRKW] 


hydroxyproline 4e-62 




PIRKW] 


extracellular matrix 5e-82 




PIRKW] 


chondroitin sulfate proteoglycan 6e-41 




PIRKW] 


sulfoprotein 7e-39 




PIRKW] 


kidney le-42 




PIRKW] 


angiogenesis inhibitor 6e-36 




PIRKW] 


Ehlers-Danlos syndrome 2e-40 




SUPFAM] 


fibronectin type III repeat homology 5e-82 




SUPFAM] 


scavenger receptor cysteine-rich domain homology le-37 




SUPFAM] 


C-type lectin homology 6e-30 




SUPFAM] 


collagen alpha 2(1) chain 5e-40 




SUPFAM] 


collagen alpha 1(1) chain 6e-44 
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[SUPFAM] fibrillar collagen carboxyl-terminal homology 6e-44 

[SUPFAM] animal Kunitz-type proteinase inhibitor homology 2e-38 

[SUPFAM] fibronectin type II repeat homology 6e-21 

[SUPFAM] complement Clq carboxyl- terminal homology le-38 

[SUPFAM] collagen alpha 3(VI) chain 2e-31 

[SUPFAM] collagen alpha l(IV) chain 7e-46 

[SUPFAM] collagen alpha 1(VI) chain 2e-37 

[SUPFAM] von Willebrand factor type C repeat homology 6e-4 4 

[SUPFAM] unassigned collagens 4e-62 

[SUPFAM] von Willebrand factor type A repeat homology 5e-82 

[SUPFAM] collagen alpha l(XIV) chain 5e-82 

[SUPFAM] pulmonary surfactant protein D 6e-30 

[SUPFAM] collagen alpha 1(V) chain 7e-39 

[SUPFAM] collagen alpha 1(VIII) chain le-38 

[SUPFAM] EGF homology le-35 

[PROSITE] AM I DAT I ON 3 

[PROSITE] MYRISTYL 14 

[PROSITE] . CK2_PHOSPHO_SITE 13 

[PROSITE] PKC_PHOSPHO_SITE 8 

[PROSITE] ASN_GLYCOSYLATION 2 

[PFAM] von Willebrand factor type A domain 

[KW] Irregular 

[KW) 3D 

[KW] SIGNAL PEPTIDE 23 

[KW] LOW_COMPLEXITY 24.24 % 

SEQ MAHYITFLCMVLVLLLQNSVLAEDGEVRSSCRTAPTDLVFILDGSYSVGPENFEIVKKWL 

SEG 

lat zB CCCEEEEEEEECCCCCCHHHHHHHHHHH 

SEQ VNITKN FDIGPKFIQVGVVQYSDYPVLEIPLGSYDSGEHLTAAVESILYLGGNTKTGKAI 

SEG 

latzB HHHHHHCCBTTTTEEEEEEEETTTEEEEETTTTTTTHHHHHHHHHHCCCCCCCCCHHHHH 

SEQ QFALDYLFDKSSRFLTKIAVVLTDGKSQDDVKDAAQAARDSKITLFAIGVGSETEDAELR 

SEG 

latzB HHHHHHHHCCTTTTTEEEEEEEECCCTTTTHHHHHHHHHHHCEEEEEEEECCCCCHHHHH 

SEQ AIANKPSSTYVFYVEDYIAISKIREVMKQKLCEESVCPTRIPVAARDERGFDILLGLDVN 

SEG 

latzB HHHGGGGGGGCECCHHHHHHHHHCHHHHHHHH 

SEQ KKVKKRIQLSPKKIKGYEVTSKVDLSELTSNVFPEGLPPSYVFVSTQRFKVKKIWDLWRI 

SEG 

latzB 

SEQ LTIDGRPQIAVTLNGVDKILLFTTTSVINGSQVVTFANPQVKTLFDEGWHQIRLLVTEQD 

SEG 

latzB 

SEQ VTLYIDDQQIENKPLHPVLGILINGQTQIGKYSGKEETVQFDVQKLRIYCDPEQNNRETA 

SEG 

latzB 

SEQ CEIPGFNGECLNGPSDVGSTPAPCICPPGKPGLQGPKGDPGLPGNPGYPGQPGQDGKPGY 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

latzB 

SEQ QGIAGTPGVPGSPGIQGARGLPGYKGEPGRDGDKGDRGLPGFPGLHGMPGSKGEMGAKGD 

SEG xx 

latzB 

SEQ KGSPGFYGKKGAKGEKGNAGFPGLPGPAGEPGRHGKDGLMGS PGFKGEAGSPGAPGQDGT 

SEG xxxxxxxxxxxxx 

latzB 

SEQ RGE PG I PG F PGN RG LMGQKGE I G P PGQQG K KG A PGM PG LMG S NGS P GQPGT PG S KG S KGE 

SEG xxxxxxxxxxxxxxxxxxxxxx 

latzB 

SEQ PGIQGMPGASGLKGEPGATGSPGEPGYMGLPGIQGKKGDKGNQGEKGIQGQKGENGRQGI 

SEG xxxxxxxxxxxxxxxxxxxxx 

latzB 

SEQ PGQQGIQGHHGAKGERGEKGEPGVRGAIGSKGESGVDGLMGPAGPKGQPGDPGPQGPPGL 

SEG xxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx 

latzB 

SEQ DGKPGREFSEQFIRQVCTDVIRAQLPVLLQSGRIRNCDHCLSQHGSPGIPGPPGPIGPEG 
SEG xxxxx xxxxxxxxxxxxxxxx 
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latzB 

SEQ PRGLPGLPGRDGVPGLVGVPGRPGVRGLKGLPGRNGEKGSQGFGYPGEQGPPGPPGPEGP 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxx 

latzB 

SEQ PGISKEGPPGDPGLPGKDGDHGKPGIQGQPGPPGICDPSLCFSVIARRDPFRKGPNY 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

latzB 



Prosite for DKFZphfbr2_2b5 . 2 



PS00001 
PS00001 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PSO0005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 

psooooe 

PS00008 
PS00009 
PS00009 
PS00009 



62->66 
329->333 

30->33 
116->119 
131->134 
250->253 

260- >263 
286->289 
393->396 
811->B14 
147->151 
172->176 

261- >265 
343->347 
357->361 
393->397 
419->423 
531->535 
600->604 
657->661 
681->685 
750->754 
754->758 

92->98 
112->118 
236->242 
276->282 
360->386 
494->500 
527->533 
596->602 
638->644 
650->656 
653->659 
665->671 
743->749 
746->752 
547->551 
628->632 
694->698 



ASN GLYCOSYLATION 

asn~glycosylation 

pkc_phospho_site 

pkc_phospho_site 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO SITE 

CK2~PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_S I TE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT I ON 

AMI DAT I ON 

AMI DAT I ON 



PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC0000 6 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 
PDOC00009 
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HMM_NAME von Willebrand factor type A domain 

HMM *DIVFLIDGSdSIGpqNFNrMKDFIeRMMERMDIgPDwIRVGVVQYSdNP 
D+VF++DGS S+GP NF+++K+ ++++ ++DIGP+ I+VGVVQYSD P 
Query 37 DLVFILDGSYSVGPENFEIVKKWLVNITKNFDIGPKFIQVGVVQYSDYP 85 

HMM RqEmr FmFNDYQN KeEI LQa I qqMMy WMgggTNTGe A I QYVvrNMFwe e r 

E +++ Y + E++++A+ ++ ++GG T+TG AIQ++++++F +++ 
Query 86 VLE — IPLGSYDSGEHLTAAVESIL-YLGGNTKTGKAIQFALDYLFDKSS 132 

HMM GmRWenvPQVMIIITDGRSQDDIRDpIneMrrmaGIqvFalGIGNhDNnn 

+ +++++++TDG+SQDD++D+++++R+ 1+ FAIG+G 

Query 133 RF LTKIAVVLTDGKSQDDVKDAAQAARD-SKITLFAIGVGSETE — 175 

HMM WeELRelASePdEdHVFyVdDFeeLdnMqeqL* 

+ELR IA++P++ +VFYV+D+ +++ ++E + 
Query 176 DAELRAIANKPSSTYVFYVEDYIAISKIREVM 207 
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DKFZphfbr2_2cl 
group: brain derived 

DKFZphfbr2_2cl encodes a novel 697 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 

unknown 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 3973 bp 

Poly A stretch at pos. 3914, polyadenylation signal at pos. 3900 

1 GGGGGGATTT CGGCGGCGGA AACATGGCGG TCGCGGCCGG GCCGGTAACG 
51 GAGAAAGTTT ACGCCGACAC TGGCCTGTAT TAGCGCGTAT GGCCTCGGGC 

101 CCTCGTTCCC CAAGGCGTGC CGCCTCCCTG TTCTCAGTCG CAGGCTGAAG 

151 CCTTGTCTGC TCTCCTCCTT TTTGGTTTGG TTTTGGAACT GACTCCGAGG 

201 GTTGGGAGAG CGCGTTGGTG GCGACGGCCG AGTCAGATCA CTATAAACAA 

251 AATTTCCACA AGAGAAAATG TTGAAATAGG AGTTGCGGAT ACATTGGATA 

301 TACTGGATGA AATACAAGCG GTTAATTTTT GTAACGTGAG GGAAAAGCCC 

351 ACATTGCTGG TTACATGTGT AAATCACTGC GTTATTGCTT TAGTCATTGT 

401 CTCTATTTAG CAATGACAAG ACTGGAAGAA GTAAATAGAG AAGTGAACAT 

4 51 GCATTCTTCA GTGCGGTATC TTGGCTATTT AGCCAGAATC AATTTATTGG 

501 TTGCTATATG CTTAGGTCTA TACGTAAGAT GGGAAAAAAC AGCAAATTCC 

551 TTAATTTTGG TAATTTTTAT TCTTGGTCTT TTTGTTCTTG GAATCGCCAG 

601 CATACTCTAT TACTATTTTT CAATGGAAGC AGCAAGTTTA AGTCTCTCCA 

651 ATCTTTGGTT TGGATTCTTG CTTGGCCTCC TATGTTTTCT TGATAATTCA 

701 TCCTTTAAAA ATGATGTAAA AGAAGAATCA ACCAAATATT TGCTTCTAAC 

751 ATCCATAGTG TTAAGGATAT TGTGCTCTCT GGTGGAGAGA ATTTCTGGCT 

801 ATGTCCGTCA TCGGCCCACT TTACTAACCA CAGTTGAATT TCTGGAGCTT 

851 GTTGGATTTG CCATTGCCAG CACAACTATG TTGGTGGAGA AGTCTCTGAG 

901 TGTCATTTTG CTTGTTGTAG CTCTGGCTAT GCTGATTATT GATCTGAGAA 

951 TGAAATCTTT CTTAGCTATT CCAAACTTAG TTATTTTTGC AGTTTTGTTA 
1001 TTTTTTTCCT CATTGGAAAC TCCCAAAAAT CCGATTGCTT TTGCGTGTTT 
1051 TTTTATTTGC CTGATAACTG ATCCTTTCCT TG AC ATT TAT TTTAGTGGAC 
1101 TTTCAGTAAC TGAAAGATGG AAACCCTTTT TGTACCGTGG AAGAATTTGC 
1151 AGAAGACTTT CAGTCGTTTT TGCTGGAATG ATTGAGCTTA CATTTTTTAT 
1201 TCTTTCCGCA TTCAAACTTA GAGACACTCA CCTCTGGTAT TTTGTAATAC 
1251 CTGGCTTTTC CATTTTTGGA ATTTTCAGGA TGATTTGTCA TATTATTTTT 
1301 CTTTTAACTC TTTGGGGATT CCATACCAAA TTAAATGACT GCCATAAAGT 
1351 ATATTTTACT CACAGGACAG ATTACAATAG CCTTGATAGA ATCATGGCAT 
1401 CCAAAGGGAT GCGCCATTTT TGCTTGATTT CAGAGCAGTT GGTGTTCTTT 
14 51 AGTCTTCTTG CAACAGCGAT TTTGGGAGCA GTTTCCTGGC AGCCAACAAA 
1501 TGGAATTTTC TTGAGCATGT TCCTAATCGT TTTGCCATTG GAATCCATGG 
1551 CTCATGGGCT CTTCCATGAA TTGGGTAACT GTTTAGGAGG AACATCTGTT 
1601 GGATATGCTA TTGTGATTCC CACCAACTTC TGCAGTCCTG ATGGTCAGCC 
1651 AACACTGCTT CCCCCAGAAC ATGTACAGGA GTTAAATTTG AGGTCTACTG 
1701 GCATGCTCAA TGCTATCCAA AGATTTTTTG CATATCATAT GATTGAGACC 
1751 TATGGATGTG ACTATTCCAC AAGTGGACTG TCATTTGATA CTCTGCATTC 
1801 CAAACTAAAA GCTTTCCTCG AACTTCGGAC AGTGGATGGA CCCAGACATG 
1851 ATACGTATAT TTTGTATTAC AGTGGGCACA CCCATGGTAC AGGAGAGTGG 
1901 GCTCTAGCAG GTGGAGATAC ACTACGCCTT GACACACTTA TAGAATGGTG 
1951 GAGAGAAAAG AATGGTTCCT TTTGTTCCCG GCTTATTATC GTATTAGACA 
2001 GCGAAAATTC AACCCCTTGG GTGAAAGAAG TGAGGAAAAT TAATGACCAG 
2051 TATATTGCAG TGCAAGGAGC AGAGTTGATA AAAACAGTAG ATATTGAAGA 
2101 AGCTGACCCG CCACAGCTAG GTGACTTTAC AAAAGACTGG GTAGAATATA 
2151 ACTGCAACTC CTGTAATAAC ATCTGCTGGA CTGAAAAGGG ACGCACAGTG 
2201 AAAGCAGTAT ATGGTGTGTC AAAACGGTGG AGTGACTACA CTCTGCATTT 
2251 GCCAACGGGA AGCGATGTGG CCAAGCACTG GATGTTACAC TTTCCTCGTA 
2301 TTACATATCC CCTAGTGCAT TTGGCAAATT GGTTATGCGG TCTGAACCTT 
2351 TTTTGGATCT GCAAAACTTG TTTTAGGTGC TTGAAAAGAT TAAAAATGAG 
2401 TTGGTTTCTT CCTACTGTGC TGGACACAGG ACAAGGCTTC AAACTTGTCA 
2451 AATCTTAATT TGGACCCCAA AGCGGGATAT TAATAAGCAC TCATACTACC 
2501 AATTATCACT AACTTGCCAT TTTTTGTATG CTGTATTTTT ATTTGTGGAA 
2551 AATACCTTGC TACTTCTGTA GCTGCTCTCA CTTTGTCTTT TCTTAAGTAA 
2601 TTATGGTATA TATAAGGCGT TGGGAAAAAA CATTTTATAA TGAAAGTATG 
2651 TAGGGAGTCA AATGCTTACT GTAAATGCAT AAGAGACGTT AAAAATAACA 
2701 CTGCACTTTC AGGAATGTTT GCTTATGGTC CTGATTAGAA AGAAACAGTT 
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2751 GTCTATGCTC TGCAATGGTC 
2801 GGCATATAAT AATAGTTTAG 
2851 TTTTAAGAAA ACTACCAGTT 
2901 TAGGTTTCAT CCAAGACCAT 
2951 AAGAAAGAGG GCAGCCTAAA 
3001 CATTCAGATG TCTTGGTTGT 
3051 GTTACATTTT AGATCAAAAT 
3101 GAGCCTACAA GTTGCTCTTC 
3151 TGGTAATATT AAGAGTCTTT 
3201 CAGTGTTTCC AGCTACAAAT 
3251 AGTACTCTCA TAGAAGAAAT 
3301 ATTTTAAGTA TTCAGAAAAG 
3351 CTTGAAAATT ATTTTTCTGA 
3401 TGAAGGTCAG AGGATAGGAA 
3451 TGTATTCTGT AAAAAAGTAT 
3501 TTGTGGTTGT AATTTTTAAA 
3551 CTGATCAGGG TATCTCCTAA 
3601 AATTATTACA TTCTAAATTT 
3651 AATAGAATTA AATTGGGGTA 
3701 CCTTCTCACC ATTGAAGCCA 
3751 TGCCTCCACC ATTTTCTACC 
3801 GGAAGAAGTT AGAGTCAGGG 
3851 TTATAGTCTG ATTATTTCTG 
3901 AATAAAATTT TGTTAAAAAT 
3951 CCGTCGACCT CGATGATGTC 



AATGATGAAT TACTAATGCC TTATTTTCTA 
AGAATGTAGA CCAGATAAAT TTGTTTACTG 
TACTTACAGA AGATTCTTTT TTCCAAACAG 
TTGAAGAACT GCAAACTCTT TCTCTTAGAA 
ATAAACGCAA AATTTGCTTA TACTCCATCA 
GACTTATTAC CAGTGTGGCA GAGAACCCAA 
ATTCTTTATG TAGGTATTGT TAAAAGGCTA 
CATGCGTTGG TCAGGGGGCC CTGAAAACAC 
CTCAGGGTAA CTTAATGTTT TCTTAATGAA 
TCTTCCAATA AATTGTCTTC CTTTTTGAAA 
TTAGCAATTT CTCGTTGACT GACTCAGTCT 
ATTTTGATCC CCATTGAGTT AATGCTCTGC 
TCCTTGTTAG TGATAACATT TTTTTTCTAC 
ACAAGTATTT CTCTTCTGGT ATACATGTAA 
TCATATTGGC AATTTTAGTT AGGCATAATA 
ACTTAGTGTT TTGTCTGATT AAAGCAGGCA 
GAGGTAATTC ACTTCTTATT CCTTTCCAAT 
TCATCTATGA GAAATAACAA ACAAGAAGGG 
TAATCTAATC TTCATTGTTT AAATGGTTTG 
TTTTTTTATA GCCTCAGAAA GAGGAAATAA 
TGGTGACTTG AAAATTGAAC TTTTAAGTTA 
AACTTGTATA CCACTATCTA TGCAGCATTG 
TGTTTTGAAT ATGATTTTCC TAATGCTCTA 
CAAAAAAAAA AAAAAAAAAA CTTATCGATA 
GAC 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 365 bp to 24 55 bp; peptide length: 697 
Category: putative protein 
Classification: unset 



1 MCKSLRYCFS HCLYLAMTRL 
51 GLYVRWEKTA NSLILVIFIL 
101 FLLGLLCFLD NSSFKNDVKE 
151 PTLLTTVEFL ELVGFAIAST 
201 AIPNLVIFAV LLFFSSLETP 
251 RWKPFLYRGR ICRRLSWFA 
301 FGI FRMICHI I FLLTLWGFH 
351 HFCLISEQLV FFSLLATAIL 
401 HELGNCLGGT SVGYAIVIPT 
4 51 IQRFFAYHMI ETYGCDYSTS 
501 YYSGHTHGTG EWALAGGDTL 
551 PWVKEVRKIN DQYIAVQGAE 
601 NNICWTEKGR TVKAVYGVSK 
651 VHLANWLCGL NLFWICKTCF 



EEVNREVNMH SSVRYLGYLA RINLLVAICL 
GLFVLGIASI LYYYFSMEAA SLSLSNLWFG 
ESTKYLLLTS IVLRILCSLV ERISGYVRHR 
TMLVEKSLSV ILLWALAML IIDLRMKSFL 
KNPIAFACFF ICLITDPFLD IYFSGLSVTE 
GMIELTFFIL SAFKLRDTHL WYFVIPGFSI 
TKLNDCHKVY FTHRTDYNSL DRIMASKGMR 
GAVSWQPTNG IFLSMFLIVL PLESMAHGLF 
NFCSPDGQPT LLPPEHVQEL NLRSTGMLNA 
GLSFDTLHSK LKAFLELRTV DGPRHDTYIL 
RLDTLIEWWR EKNGSFCSRL IIVLDSENST 
LIKTVDIEEA DPPQLGDFTK DWVEYNCNSC 
RWSDYTLHLP TGSDVAKHWM LHFPRITYPL 
RCLKRLKMSW FLPTVLDTGQ GFKLVKS 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphfbr2_2cl, frame 2 

PIR:A71148 hypothetical protein PH0395 - Pyrococcus horikoshii, N = 1, 
Score « 96, P - 0.12 



>PIR:A71148 hypothetical protein PH0395 - Pyrococcus horikoshii 
Length = 288 

HSPs: 

Score - 96 (14.4 bits), Expect « 1.3e-01, P • 1.2e-01 
Identities « 59/234 (25%), Positives - 116/234 (49%) 
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Query: 77 IASILYYYFSMEAASLSLSNLWFGFLL--GL-- LCFLDNSSFKNDVKEESTKYLLLTSIV 132 

++ +LYY F+ A ++ L G+LL + L +L N + V+ + K + + + 
Sbjct: 57 LSLVLYYLFAFSALK-TIIFLALGYLLMNSIYELGYLMNDTISRRVEGKVHKVRVKLTVF 115 

Query: 133 LRILCSLVERISGYVRHRPTLLTTVEFLELVGFAIASTTMLVEKSLSVILLVVALAMLII 192 

+L +L I YV ++ T+ FL+LVG ++ +L E +L ++ L+ L + 

Sbjct: 116 DSLLIALSRAI— YV VIFTLVFLKLVGLQYSTQVILAEVTLFLVFLLYDLTPKHV 168 

Query: 193 DLRMKSFLAIPNLVIFAVLLFFSSLET-PKNPIAFACFFICLITDPFLDI YFSGLSVTER 251 

M SF + + F +LL F T +N I + FI I F ++ + + 
Sbjct: 169 RTVMLSF-PLKFMKAFVLLLPFIITGTLVENVITLS — FILPIAVRFSQAHYLKTACKDN 225 

Query: 252 WKPFLYRGRICRRLSVVFAGMIEL-TFFILSAFK-LRDTHLW-YFVIPGFSIFGIFRMIC 308 

P + + R+ R S+++ + h TF +L +F L +T L ++IP F++ + ++ 
Sbjct: 226 -PPRDFKRRV-ERFSMMYLQVTSLSTFTVLVSFVYLGNTDLLRQYLIP-FAVNVVLILLS 282 

Query: 309 HI 310 
++ 

Sbjct: 283 YL 284 



Pedant information for DKFZphfbr2_2cl, frame 2 



Report for DKF2phfbr2_2cl . 2 



[LENGTH] 697 

[MW] 79741.46 

tpl] 8.41 

[KW] TRANSMEMBRANE 11 

[KW] LOW_COMPLEXITY 9.76 % 



SEQ MCKSLRYCFSHCLYLAMTRLEEVNREVNMHSSVRYLGYLARINLLVAICLGLYVRWEKTA 

SEG 

PRD ccceeehhhhhhhhhhhhhhhhhhhhhhccceeeehhhhhhhhhhhhhhhhhhhcccccc 

MEM MMMMMMMMMMMMMMMMM 

SEQ NSLILVIFILGLFVLGIASILYYYFSMEAASLSLSNLWFGFLLGLLCFLDNSSFKNDVKE 

SEG . .xxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx 

PRD ccceeeeccccchhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccc 

MEM . . . MMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMM 

SEQ ESTKYLLLTSIVLRILCSLVERISGYVRHRPTLLTTVEFLELVGFAIASTTMLVEKSLSV 

SEG xxxxxxxxxxxx xxxx 

PRD ccchhhhhhhhhhhhhhhhhhhceeeeccccccchhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM MMM 

SEQ ILLVVALAMLIIDLRMKSFLAIPNLVIFAVLLFFSSLETPKNPIAFACFFICLITDPFLD 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhcccccccccchhhhhhhhhcccccee 

MEM MMMMMMMMMMMMMM . . . MMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMM . 

SEQ IYFSGLSVTERWKPFLYRGRICRRLSVVFAGMIELTFFILSAFKLRDTHLWYFVIPGFSI 

SEG 

PRD eeeccccccccccceeecccccccchhhhhhhhhhhhhhhhhhhccccceeeeeeccccc 

MEM MMMMMMMMMMMMMMMMM M 

SEQ FGIFRMICHIIFLLTLWGFHTKLNDCHKVYFTHRTDYNSLDRIMASKGMRHFCLISEQLV 

SEG 

PRD hhhhhhhhhhhhhhhhhcccccccceeeeeeeccccccchhhhhhhcccchhhhhhhhhh 

MEM MMMMMMMMMMMMMMMM MM 

SEQ FFSLLATAILGAVSWQPTNGIFLSMFLIVLPLESMAHGLFHELGNCLGGTSVGYAIVIPT 

SEG 

PRD hhhhhhhhhhhhcccccccchhhhhhhheeehhhhhhhhhhccccccccccceeeeeeec 

MEM MMMMMMMMMMMMMMM . • • ■ MMMMMMMMMMMMMMMMM 

SEQ NFCSPDGQPTLLPPEHVQELNLRSTGMLNAIQRFFAYHMIETYGCDYSTSGLSFDTLHSK 

SEG 

PRD ccccccccccccccccccccccccchhhhhhhhhhhhhhhhccccccccccccchhhhhh 

MEM 



SEQ LKAFLELRTVDGPRHDTYILYYSGHTHGTGEWALAGGDTLRLDTLIEWWREKNGSFCSRL 

SEG 

PRD hhhhhhhhhccccccceeeeeeccccccccceeeccccchhhhhhhhhhhhccccceeee 

MEM 

SEQ I I VLDSENSTPWVKEVRKINDQYI AVQGAELIKTVDI EEADPPQLGDFTKDWVEYNCNSC 
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SEG 

PRD eeeeecccccccchhhhhhccceeeeccceeeeeeeecccccccccccccceeeeccccc 

MEM 

SEQ NNICWTEKGRTVKAVYGVSKRWS DYTLHLPTGSDVAKHWMLHFPRITYPLVHLANWLCGL 

SEG 

PRD cceeeecccceeeeeeeecccccceeeecccccchhhhhhhcccccccchhhhhhhhhcc 

MEM 

SEQ NLFWICKTCFRCLKRLKMSWFLPTVLDTGQGFKLVKS 

SEG 

PRD eeeeeehhhhhhhhhhhhhhcceeeeccccccccccc 

MEM 



(No Prosite data available for DKFZphfbr2_2cl .2) 
(No Pfam data available for DKFZphfbr2_2cl .2} 
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DKFZphfbr2_2cl7 



group: signal transduction 

DKFZphfbr2_2cl7.3 encodes a novel 446 amino acid protein with similarity to yeast YMRl 31c and 
mammalian retinoblastoma-binding protein RbAp4 6 

The protein contains 1 WD-40 repeat, which is typical for the beta-transducin subunit of G- 
proteins. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. 

The new protein can find application in modulating/blocking G-protein-dependent pathways. 

similarity to YMR131C and retinoblastoma-binding protein RbAp4 6 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 2248 bp 

Poly A stretch at pos. 2230, polyadenylation signal at pos. 2200 

1 TGGGGAAGAT GGCGGCGCGC AAGGGTCGGC GTCGCACGTG TGAAACCGGG 

51 GAACCCATGG AAGCCGAGTC CGGCGACACA AGTTCCGAGG GCCCGGCCCA 

101 GGTCTACCTG CCCGGCCGGG GGCCGCCGCT ACGCGAAGGG GAGGAGCTGG 

151 TCATGGACGA GGAGGCCTAT GTGCTCTACC ACCGAGCGCA GACTGGCGCC 

201 CCCTGTCTCA GCTTTGACAT AGTCCGGGAT CACCTGGGAG ACAACCGGAC 

251 AGAGCTTCCT CTTACACTTT ACTTGTGTGC TGGGACCCAG GCTGAGAGCG 

301 CCCAGAGCAA CAGACTGATG ATGCTTCGGA TGCACAATCT GCATGGGACA 

351 AAGCCCCCAC CCTCAGAGGG CAGTGATGAA GAAGAAGAGG AGGAAGATGA 

401 AGAGGATGAA GAAGAGCGGA AACCTCAGCT GGAGCTGGCC ATGGTGCCCC 

451 ACTATGGTGG CATCAACCGA GTTCGGGTGT CATGGCTGGG TGAAGAGCCT 

501 GTGGCTGGGG TGTGGTCAGA GAAGGGCCAG GTGGAGGTGT TTGCGCTGCG 

551 GCGGCTTCTG CAGGTGGTGG AGGAGCCCCA GGCCCTGGCA GCCTTCCTCC 

601 GGGATGAGCA GGCCCAAATG AAGCCCATCT TCTCCTTCGC TGGACACATG 

651 GGCGAGGGCT TTGCCCTTGA CTGGTCCCCC CGGGTGACCG GTCGCCTGCT 

701 GACCGGTGAC TGTCAAAAGA ACATCCACCT CTGGACACCT ACGGACGGCG 

751 GCTCCTGGCA CGTGGACCAG CGGCCATTCG TGGGCCACAC ACGCTCTGTG 

801 GAGGACCTGC AGTGGTCACC GACTGAGAAC ACGGTGTTTG CCTCCTGCTC 

851 AGCTGACGCC TCCATCCGCA TCTGGGACAT CCGGGCAGCC CCCAGCAAGG 

901 CCTGCATGCT CACCACAGTC ACCGCCCATG ATGGGGACGT CAATGTCATC 

951 AGCTGGAGCC GCCGGGAGCC CTTCCTGCTC AGTGGCGGGG ATGATGGGGC 

1001 CCTCAAGATC TGGGACCTTC GGCAGTTCAA GTCTGGTTCC CCAGTGGCCA 

1051 CCTTCAAGCA GCACGTGGCC CCCGTGACCT CCGTCGAGTG GCACCCCCAG 

1101 GACAGCGGGG TCTTTGCAGC CTCGGGTGCA GACCACCAGA TCACACAGTG 

1151 GGACCTGGCA GTGGAGCGGG ACCCTGAGGC GGGCGACGTG GAGGCCGACC 

1201 CCGGACTGGC CGACCTCCCG CAGCAGCTGC TGTTCGTGCA CCAGGGCGAG 

1251 ACCGAGCTGA AGGAGCTGCA CTGGCACCCG CAGTGCCCAG GGCTCCTGGT 

1301 CAGCACGGCG CTGTCAGGCT TCACCATCTT CCGCACCATC AGCGTCTGAG 

1351 GCGTCCCACT GGCTCTGATC TTGCTTCCTG CTTGGAAACT GAAGTCGAAT 

1401 TGGGCTCCCC TGGAAGGGGT TCATTCAGGT CTGTTGACTG AGACTGGCCG 

1451 GCCTGTGGGC TGCCGTGATG GATTCTGTTT GACGTATTGT TCTCTAGAAG 

1501 GCCTGGCTCT GATCCAGTGA CCCCTCTCAC CAAAGAACTC GGTTTAACCA 

1551 GGGCTCTGTA AGACCACTCC CACCCAGAGA CTTGTGTGGC CTGGTGTGGC 

1601 CTGTGTGTCG GATTCCTTCC TGTCAGCTGT GACCCATTTG ACCTGTGTCC 

1651 CCAGAACCCA GTTTTTTGTT TGTTTGTTTG AGACGGAGTC TTGGTCTGTC 

1701 GCCCAGGCTG GAGTGCAGTA GCACGATCTT GGCTCACTGC AACCTCCGCC 

1751 TCCTGGGTTA AAGTGATTCT CTCAGCTCAG TCTCCCAGGT AGCTGGGATT 

1801 ACAGGCATGT GCCACCACAC CCCGTTAATT TTTGTATTTT TAGTAGAGAC 

1851 GGGGTTTCAC CATGTTGGCC AGGCTGGTCT CAAATTCTTG ATCTCAAGTG 

1901 ATCTGTCCGC CCCGGCCTCC CAGAGTGCTG GGTTGGGATT ACAGGCGTGA 

1951 GCCACCGCGT CCGGCTCAGG ACCCAGTTTT GGCTGCTGGT TCCCAGCAGG 

2001 GGACTCGGGG GATATACAGT GGCTGCACCA AATTGGAGGT GTGGGTTCCT 

2051 CCAACACAAT TTGCTTCTGC CCGTTGTCTT CCTGCCAGCT GGGTTTGGCC 

2101 AGGATTTCTC CGTGTGGGGG CTACATGCGA CCCTCTCCCC TCCTCCCTGA 

2151 CTTTAGAGGC TGGTGCTGTG TCGGGAGGAA GGTCAGGGCT CCTGAGCAGC 

2201 AATAAAGGAC CAGGAAGAGG CCTGAGGTGG AAAAAAAAAA AAAAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 9 bp to 134 6 bp; peptide length: 446 
Category: similarity to known protein 
Classification: unset 
Prosite motifs: WD REPEATS (323-338) 



1 MAARKGRRRT CETGEPMEAE SGDTSSEGPA QVYLPGRGPP LREGEELVMD 

51 EEAYVLYHRA QTGAPCLSFD IVRDHLGDNR TELPLTLYLC AGTQAESAQS 

101 NRLMMLRMHN LHGTKPPPSE GSDEEEEEED EEDEEERKPQ LELAMVPHYG 

151 GINRVRVSWL GEEPVAGVWS EKGQVEVFAL RRLLQVVEEP QALAAFLRDE 

201 QAQMKPIFSF AGHMGEGFAL DWSPRVTGRL LTGDCQKNIH LWTPTDGGSW 

251 HVDQRPFVGH TRSVEDLQWS PTENTVFASC SADASIRIWD IRAAPSKACM 

301 LTTVTAHDGD VNVISWSRRE PFLLSGGDDG ALKIWDLRQF KSGSPVATFK 

351 QHVAPVTSVE WHPQDSGVFA ASGADHQITQ WDLAVERDPE AGDVEADPGL 

401 ADLPQQLLFV HQGETELKEL HWHPQCPGLL VSTALSGFTI FRTISV 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2cl7 , frame 3 

TREMBL:AC005917_14 gene: "F3P11.14"; product: "putative WD-40 repeat 
protein"; Arabidopsis thaliana chromosome II BAC F3P11 genomic 
sequence, complete sequence., N = 1, Score « 910, P = 2.7e-91 

PIR:S53061 hypothetical protein YMR131C - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 691, P = 4.3e-68 

PIR: 149367 retinoblastoma-binding protein mRbAp46 - mouse, N = 1, Score 
= 338, P = l.le-30 

PIR: 139181 retinoblastoma-binding protein RbAp46 - human, N = 1, Score 
= 338, P = l.le-30 



>TREMBL:AC005917_14 gene: "F3P11.14"; product: "putative WD-40 repeat 

protein"; Arabidopsis thaliana chromosome II BAC F3P11 genomic sequence, 
complete sequence. 

Length =4 69 

HSPs: 



Score => 910 (136.5 bits), Expect = 2.7e-91, P = 2.7e-91 
Identities - 195/442 (44%), Positives - 259/442 (58%) 



Query: 


18 


EAESGDTSSEGPAQVYLPGRGPPLREGEELVMDEEAYVLYHRAQTGAPCLSFDIVRDHLG 


77 






EA S + S P +V+ PG L +GEEL D AY H G PCLSFDI+ D LG 




Sbjct: 


18 


EASSSEIPSI-PTRVWQPGVDT-LEDGEELQCDPSAYNSLHGFHVGWPCLSFDILGDKLG 


75 


Query: 


78 


DNRTELPLTLYLCAGTQAESAQSNRLMMLRMHNLHGTKP PPSEGSDEEEEEEDEED- 


133 






NRTE P TLY+ AGTQAE A N + + ++ N+ G + P + G+ E+E+E+DE+D 




Sbjct: 


76 


LNRTEFPHTLYMVAGTQAEKAAHNSIGLFKITNVSGKRRDVVPKTFGNGEDEDEDDEDDS 


135 


Query: 


134 


EEERKPQLELAMVPHYGGINRVRVSWLGEEPVAGVWSEKGQVEVFALRRLLQ 


185 






E + P.+++ V H+G +NR+R + W++ G V+V+ + L 




Sbjct: 


136 


DS DDDDGDEAS KTPNIQVRRVAHHGC VNRI RAMPQNSH- I C VS WADSGHVQVWDMS SHLN 


194 


Query: 


186 


VVEEPQALAAFLRDEQAQMKPIFSFAGHMGEGFALDWSPRVTGRLLTGDCQKNIHLWTPT 


245 






+ E + P+ +F+GH EG+A+DWSP GRLL+GDC+ IHLW P 




Sbjct: 


195 


ALAESETEGKDGTSPVLNQAPLVNFSGHKDEGYAIDWSPATAGRLLSGDCKSMIHLWEPA 


254 


Query: 


246 


DGGSWHVDQRPFVGHTRSVEDLQWSPTENTVFASCSADASIRIWDIRAAPSKACMLTTVT 


305 






G SW VD PF GHT SVEDLQWSP E VFASCS D S+ +WDIR S A + 




Sbjct: 


255 


SG-SWAVDPIPFAGHTASVEDLQWSPAEENVFASCSVDGSVAVWDIRLGKSPAL SFK 


310 


Query: 


306 


AHDGDVNVISWSRREPFLL-SGGDDGALKIWDLRQFKSGSPV-ATFKQHVAPVTSVEWHP 


363 




AH+ DVNVISW+R +L SG DDG I DLR KG V A F+ H P+TS+EW 




Sbjct: 


311 


AHNADVNVISWNRLASCMLASGSDDGTFSIRDLRLIKGGDAVVAHFEYHKHPITSIEWSA 


370 
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Query: 364 QDSGVFAASGADHQITQWDLAVERDPE AGDVEADPGLADLPQQLLFVHQGETEL 417 

++ A + D+Q+T WDL++E+D E A E DLP QLLFVHQG+ +L 

Sbjct: 3*71 HEASTLAVTSGDNQLTIWDLSLEKDEEEEAEFNAQTKELVNTPQDLPPQLLFVHQGQKDL 430 

Query: 418 KELHWH PQCPGLLVSTALSGFT I FRT I S V 446 

KELHWH Q PG+++STA GF I ++ 
Sbjct: 431 KELHWHNQI PGMIISTAGDGFNILMPYNI 459 



Pedant information for DKFZphfbr2_2cl7, frame 3 



Report for DKFZphfbr2_2cl7 . 3 



LENGTH] 
MWJ 
PI] 

HOMOL] 
Arabidopsis 

FUNCAT] 

FUN CAT ] 

FUNCAT] 

FUNCAT) 
palmitylation, 

FUNCAT] 

FUNCAT] 

FUNCAT] 

FUNCAT] 

FUNCAT ] 

FUNCAT] 

FUNCAT) 

FUNCAT] 

FUNCAT ] 

FUNCAT] 
YDL145C) 5e-09 

FUNCAT] 
5e-09 

FUNCAT] 

TAF90 - TFIID subunit] 6e-09 



446 

49447.38 
4.82 

TREMBL:AC005917_14 gene: "F3P11.14"; product: "putative WD-40 repeat protein"; 
thaliana chromosome II BAC F3P11 genomic sequence, complete sequence, le-90 
99 unclassified proteins [S. cerevisiae, YMR131C ] 4e-65 

30.03 organization of cytoplasm [S. cerevisiae, YEL056w] 4e-15 

04.05.01.04 transcriptional control [S. cerevisiae, YEL056w] 4e-15 
06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) (S. cerevisiae, YEL056w] 4e-15 

04.05.01.07 chromatin modification IS. cerevisiae, YBR195c] 2e-13 
10.04.09 regulation of g-protein activity [S. cerevisiae, YBRl95c] 2e-13 
06.10 assembly of protein complexes [S. cerevisiae, YBR195C] 2e-13 
03.16 dna synthesis and replication [S. cerevisiae, YBRl95c] 2e-13 
09.13 biogenesis of chromosome structure [S. cerevisiae, YBR195c] 2e-13 
30.10 nuclear organization {S. cerevisiae, YPR178w] le-11 
04.05.03 mrna processing (splicing) [S. cerevisiae/ YPRl78w] le-11 
06.13 proteolysis (S. cerevisiae, YGL003c] 4e-09 

03.22 cell cycle control and mitosis [S. cerevisiae, YGL003c] 4e-09 
30.09 organization of intracellular transport vesicles [S. cerevisiae, 



08.07 vesicular transport (golgi network, etc.) 
04.05.01.01 general transcription activities 



[S. cerevisiae, YDLl45c] 
[S. cerevisiae, YBR198c 



FUNCAT] 
YMRll6c] 
FUNCAT] 
FUNCAT] 
FUNCAT] 
FUNCAT] 
e-06 
FUNCAT] 
FUNCAT] 
FUNCAT) 
FUNCAT] 
FUNCAT] 
FUNCAT] 
FUNCAT) 
e-05 
FUNCAT] 
e-05 
FUNCAT] 
FUNCAT] 
FUNCAT] 
FUNCAT] 

t 

FUNCAT] 
BLOCKS] 
SCOP] 
PIRKW] 
PIRKW) 
PIRKW) 
PIRKW] 
PIRKW] 
PIRKW] 
PIRKW] 
PIRKW] 
PIRKW] 
PIRKW) 
PIRKW] 
SUPFAM] 
SUPFAM] 
SUPFAM] 
SUPFAM] 



5e-08 



05.04 translation (initiation, elongation and termination) [S. cerevisiae, 



02.16 fermentation [S. cerevisiae, YMRll6c) 5e-08 

30.04 organization of cytoskeleton {S. cerevisiae, YLR429w] 3e-07 

30.19 peroxisomal organization [S. cerevisiae, YDR142c] 3e-06 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YDR142c] 

08.10 peroxisomal transport [S. cerevisiae, YDR142c] 3e-06 
03.13 meiosis [S. cerevisiae, YLR129w] 4e-06 
08.01 nuclear transport [S. cerevisiae, YER107c] 4e-06 

03.01 cell growth [S. cerevisiae, YKL021c] 4e-06 

04.07 rna transport IS. cerevisiae, YER107c] 4e-06 
03.25 cytokinesis [S. cerevisiae, YCR057c] 2e-05 

03.04 budding, cell polarity and filament formation [S 



cerevisiae, YCR057c] 



01.01.04 regulation of amino-acid metabolism 



[S. cerevisiae, yil046w] 



06.13.01 cytoplasmic degradation [S. cerevisiae, YIL046w] 2e-05 

04.01.04 rrna processing [S. cerevisiae, YLLOllw] 3e-05 

30.02 organization of plasma membrane [S, cerevisiae, YOR212w] 5e-05 

03.07 pheromone response, mating-type determination, sex-specific proteins 
[S. cerevisiae, YOR212w] 5e-05 

10.05.07 g-proteins [S. cerevisiae, YOR212w] 5e-05 
BL00678 

d2trcb_ 2.51.3.1.1 Transducin (heterotrimeric G protein), gamm 5e-29 
plasma 6e-07 
duplication 4e-12 
hormone 6e-07 

transmembrane protein le-07 
stomach 6e-07 
actin binding le-07 
leucine zipper le-07 
signal transduction 2e-06 
heterotrimer 2e-06 
peripheral membrane protein 6e-07 
GTP binding 2e-06 
WD repeat homology le-63 
yeast coatomer complex alpha chain le-07 
GTP-binding regulatory protein beta chain 4e-07 
PRL1 protein 8e-09 
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[SUPFAMJ MSI1 protein 4e-12 

[SUPFAM] coatomer complex beta' chain le-09 

[PROSITE] WD_RE PEATS 1 

[PFAM] WD domain, G-beta repeats 

[KW] All_Beta 

tKW] 3D 

[KW] LOW COMPLEXITY 3.14 % 



SEQ 
SEG 
IgotB 

SEQ 
SEG 
IgotB 

SEQ 
SEG 
IgotB 

SEQ 
SEG 
IgotB 

SEQ 
SEG 
IgotB 

SEQ 
SEG 
IgotB 

SEQ 
SEG 
IgotB 

SEQ 
SEG 
IgotB 



MAARKGRRRTCETGEPMEAESGDTSSEGPAQVYLPGRGPPLREGEELVMDEEAYVLYHRA 



QTGAPCLSFDIVRDHLGDNRTELPLTLYLCAGTQAESAQSNRLMMLRMHNLHGTKPPPSE 



GSDEEEEEEDEEDEEERKPQLELAMVPHYGGINRVRVSWLGEEPVAGVWSEKGQVEVFAL 
. . xxxxxxxxxxxxxx 



RRLLQVVEEPQALAAFLRDEQAQMKPIFSFAGHMGEGFALDWSPRVTGRLLTGDCQKNIH 

EEECCCCCEEEEEETTT-TCEEEEEETTTEEE 

LWTPTDGGSWHVDQRPFVGHTRSVEDLQWSPTENTVFASCSADASIRIWDIRAAPSKACM 

EEETTTT CEEEEEECCCCCEEEEEEETTTCE-EEEEETTTEEEEEETTT — TEEEE 

LTTVTAHDGDVNVISWSRREPFLLSGGDDGALKIWDLRQFKSGSPVATFKQHVAPVTSVE 

EECBTTBTCCEEEEEETTTTTEEEEEETTTEEEEEE 

WHPQDSGVFAASGADHQITQWDLAVERDPEAGDVEADPGLADLPQQLLFVHQGETELKEL 

HWHPQCPGLLVSTALSGFTIFRTISV 



PS00678 



Prosite for DKFZphfbr2_2cl7 . 3 
323->338 WD REPEATS PDOC00574 



Pfam for DKFZphfbr2_2cl7 .3 



HMM_NAME 

HMM 

Query 



WD domain, G-beta repeats 



257 



♦MrGHnnWVWC VaFS PDG rWFI vSGSWDgTCRLWD* 
++GH+ V ++ +SP + +++S S D ++R+WD 
FVGHTRSVEDLQWSPTENTVFASCSADASIRIWD 



290 



24.88 304 336 1 

binding protein RbAp46 

Alignment to HMM consensus: 
Query *MrGHnnWVWCVaFSPDGrWFI vSGSWDgTCRLWD 

+ H++tV+ +++S + ++SG++DG +++WD 

dkfzphfbr2 304 VTAHDGDVNVISWSRREPF-LLSGGDDGALKIWD 



34 dkf zphfbr2_2cl7 . 3 similarity to YMR131C and retinoblastoma- 



336 
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DKFZphfbr2_2cl8 



group: brain associated 

DKFZphfbr2_2cl8 encodes a novel 302 amino acid protein with weak similarity to cyclin- 
dependent kinase pl30-PITSLRE. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 

weak similarity to cyclin-dependent kinase pl30-FITSLRE 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 2835 bp 

Poly A stretch at pos. 2817, polyadenylation signal at pos. 2796 

1 TGGGGCGGAC GGCGAGGGAG TCCAGAGCCT TGAGCCCGGT GCTCCTCCCT 

51 CGCGCAGCGG TGGCTCTGCG GCCGCTGGAG TAAACACTGC CTTTGTTCCC 

101 TAGCGCCTCG TCTTTCGTCG CCCCGTGCCC TCACGCCGCC GGGCTCTGGC 

151 CGGCCCGCCC TCGGTCCTTG AACCCCATTT CGGCTCGTGC CGTGCGGATG 

201 CAGCTGCCGG GCCTGGGTTT GGGCATTGAG CGGGAGGAGG AGGAGGAGCG 

251 GCGGCGCCTG GGCGGCATGC GATGGGGAAC TGCTGCTGGA CGCAGTGCTT 

301 CGGACTGCTT CGCAAGGAAG CGGGGCGGCT GCAGCGAGTA GGCGGCGGCG 

351 GAGGATCCAA GTATTTTAGA ACATGCTCAA GAGGTGAGCA CTTGACAATA 

401 GAGTTTGAGA ATCTAGTAGA AAGTGATGAA GGGGAGAGCC CAGGAAGCAG 

451 TCATAGGCCT CTTACTGAGG AAGAAATTGT TGACCTAAGA GAAAGGCATT 

501 ATGATTCCAT TGCCGAAAAA CAAAAAGATC TTGATGAGAA AATTCAAAAA 

551 GAGTTAGCCT TACAAGAAGA GAAGTTAAGA CTAGAAGAAG AAGCTTTATA 

601 CGCTGCACAG CGTGAAGCAG CCAGGGCAGC AAAGCAGCGA AAGCTCTTGG 

651 AGCAAGAAAG GCAGAGAATT GTGCAGCAAT ATCATCCTTC CAACAATGGA 

701 GAATATCAAA GTTCAGGACC AGAAGATGAC TTCGAATCTT GTTTGAGAAA 

751 TATGAAGTCA CAGTATGAAG TTTTTCGAAG TAGTAGACTC TCATCAGATG 

801 CTACAGTTTT GACACCAAAT ACAGAAAGCA GTTGTGATTT AATGACCAAA 

851 ACTAAATCAA CTAGTGGAAA TGACGACAGC ACATCCTTAG ATCTAGAGTG 

901 GGAAGATGAA GAAGGAATGA ATAGAATGCT TCCAATGAGA GAACGTTCCA 

951 AAACAGAGGA AGACATTCTA CGGGCAGCAC TTAAGTATAG CAACAAGAAG 

1001 ACTGGAAGTA ATCCTACATC AGCCTCTGAT GATTCCAATG GGCTGGAGTG 

1051 GGAAAATGAT TTTGTTAGTG CCGAAATGGA TGATAATGGA AATTCCGAGT 

1101 ATTCTGGATT TGTAAATCCT GTATTAGAAC TGTCTGATTC TGGCATAAGG 

1151 CATTCTGACA CAGATCAACA GACTCGATAG GGTAAAATTG TGTGACCTTG 

1201 TTTATCAGTT ATGACCAAAT GTTAAAAACC AACTAGAATG TATAAGTGAT 

1251 TGTGCTTAGC CTTTTTGTAA GGGAGATGTG TAAGAAACCA TGCTGTAAAT 

1301 GCTTATTTTA TTACAAAGGA GTAGGGATGA TAGGATCTGA ATTGATACAG 

1351 AATTAAGTGC AATTTCATCA TCTGCCTTCT GCTTTTCAAG ACCAATTTAA 

1401 TGGTCCTGTC ATGTTACTGA TTAAATTTAC TTTGTCTTGT CTTTATAGCA 

1451 TTTCTGTTTA CTATGGTAGA TTTCCACTTT CAATTTTTAA AATTAATTTT 

1501 ACTTTGAATG ATTTATGAAG CCTATTTCAT TGTCTAACTA TGAAAATATT 

1551 AAGACTTTTT TGTTAATTCT CAGCCGATGT GAAGGAAGCA TGAGGAGGGA 

1601 TCGTCAGACT CAGATTTAGA ATAGTGTTCC CGTTTCCAGC ATTATTTATT 

1651 TCTATGACTT CTTTGGATTT TATTATCTAA TAGTAAGTAC AGTTGATGTG 

1701 GGTAGATGAC TCTAAGAAAT GCTGAAGTAT CGGCATTACA TGTGTTTATT 

1751 TACATGTCCT AGTTTGATAA TGTTGATTCA ATCTGAACAA AAGATAATAT 

1801 AAAAATAACC CTTCAGAGTT TGGACATTTC AAGTTGGTAA TAATAAAAAA 

1851 TAATATTTAA GAAGATATAT ATATATATAT ATTTAGTTTT TTCCACTTCA 

1901 TTTTACATGC CACTATATTG ACTTTAATTG ATATACAGTA TTAAGTTTTT 

1951 AGGTGCCATT ATTTTTAAAA AATTCTATAT TTCCAATGAA CGATGTTAGA 

2001 TTTTACACAG AACATATTCT CTGCATGATT TCAGAAAAGA AAATCTAAAA 

2051 AGGTAATACG GGTATTTCAA ATAAAATCCT TTCTGGTATG AAAGGCTCCA 

2101 TTGATTTTAT TAAGCCTTCC TTTACCTTGT AGTACAAGGT GCTTTAATGG 

2151 GATAGAACTA AGCATATCAA TATCTATAAC TGCATTTTGT GCTAGACAAT 

2201 TACTGTTCTT TTCTCTAAAA TGTATATGTC AATTTACAAG GCCAGGGATA 

2251 GAAAACACTC CATAATTGCT TTCCTTGATT TTGCTGAGGA TTTGGTATGA 

2301 TTTTAGTAAG CAAACTGTTT TTTGGTTTTT CCTTAATGTT TTTAATTTTT 

2351 TTTCCTCTTG CAACAATGAC GGTGCATGTT CTTATAAATA TAGGAAGGTC 

2401 CAGATATAAA TAGTAACCTA AAGTTCTTGC TGTGCTTAAA AAAAAAAATC 

2451 ATGTGGCTCT TTCAATATTT GAACTGCTAA GCAATGACAT CTGTAGTTTT 

2501 ATCTCCTTTT TTATGTCATA GAAATTAATA TGATACTTTA AATATGTAAA 

2551 TATAATACAT TGGTAATGCT ATT ATT TATA TCTGTCTTAA CATAATTTAA 

2601 GTTGTAGCTG TGTCTTGGAA ATATTTTTAA GGTAATCTAT ATTCACATTG 

2651 CCTGTGTTAA TGCTTTTTAA GGTTTGTATA CATCAGATGT ATATTTTTGG 
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2701 TTTGGCATAA GCTACGATTG TAATTTTTCT TGGCTTTTTG TTCATAAAGA 
2751 ATTTTTTGAA GGAATGGTAA CAAATGGTAA TTTACAAATG GTTGTGAATA 
2801 AACACATTTT TACACTTAAA AAAAAAAAAA AAAAA 



BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 272 bp to 1177 bp; peptide length: 302 
Category: similarity to known protein 



1 MGNCCWTQCF GLLRKEAGRL QRVGGGGGSK YFRTCSRGEH LTIEFENLVE 

51 SDEGESPGSS HRPLTEEEIV DLRERHYDSI AEKQKDLDEK IQKELALQEE 

101 KLRLEEEALY AAQREAARAA KQRKLLEQER QRIVQQYHPS NNGEYQSSGP 

151 EDDFESCLRN MKSQYEVFRS SRLSSDATVL TPNTESSCDL MTKTKSTSGN 

201 DDSTSLDLEW EDEEGMKRML PMRERSKTEE DILRAALKYS NKKTGSNPTS 

251 ASDDSNGLEW ENDFVSAEMD DNGNSEYSGF VNPVLELSDS GIRHSDTDQQ 

301 TR 

BLASTP hits 

Entry A55817 from database PIR: 
cyclin-dependent kinase pl30-PITSLRE - mouse 
Length - 783 

Score = 123 (43.3 bits), Expect •= 0.00013, P - 0.00013 
Identities = 53/197 (26%) , Positives = 96/197 (48%) 



Alert BLASTP hits for DKFZphfbr2_2cl8, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_2cl8, frame 2 

Report for DKFZphfbr2_2cl8 . 2 

[LENGTH] 302 

[MWJ 34281.39 

[pi] 4.73 

[PROSITE] MYRISTYL 5 

[PROSITE] CK2 PHOSPHORS ITE 12 

[PROSITE] TYR~PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 3 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 13.58 % 

[KW] COILED_COIL . 13.58 % 

SEQ MGNCCWTQCFGLLRKEAGRLQRVGGGGGSKYFRTCSRGEHLT IE FEN LVE SDEGESPGSS 

SEG xxxxx 

PRD ccccccccchhhhhhhhhheeecccccccceeeeccccccchhhhhhhhccccccccccc 
COILS 

SEQ HRPLTEEEIVDLRERHYDSIAEKQKDLDEKIQKELALQEEKLRLEEEALYAAQREAARAA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS \ . . . .CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ KQRKLLEQERQRIVQQYHPSNNGEYQSSGPEDDFESCLRNMKSQYEVFRSSRLSSDATVL 

SEG xxxxxxx 

PRD hhhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhheeeeecccccceeee 

COILS CCCCCCCCC 



215 



WO 01/12659 



PCT/ffiOO/01496 



SEQ TPNTESSCDLMTKTKSTSGNDDSTSLDLEWEDEEGMNRMLPMRERSKTEEDILRAALKYS 

SEG 

PRD ccccccccccccccccccccccccchhhhhhhccccccchhhhhhhcchhhhhhhhhhhc 

COILS 

SEQ NKKTGSNPTSASDDSNGLEWENDFVSABMDDNGNSEYSGFVNPVLELSDSGIRHSDTDQQ 

SEG 

PRD cccccccccccccccccccccccceeeecccccccccccccceeeecccccccccccccc 

COILS 



SEQ 
SEG 
PRD 
COILS 



TR 
CC 



Prosite for DKFZphfbr2_2clB .2 



PS00005 
PS00005 
PS00005 
PSO0006 
PS00006 
PSO00O6 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 



60->63 
170->173 
240->243 
36->40 
65->69 
79->83 
148->152 
163->167 
186->190 

198- >202 
204->208 
226->230 
228->232 
250->254 
295->299 
103->111 
103->111 

24- >30 

25- >31 

199- >205 
245->251 
291->297 



PKC_PHOSPHO 

PKC_PHOSPHO~ 

PKC_PHOSPHO _ 

CK2_PHOSPHO" 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO _ 

CK2_PHOSPHO~ 

CK2_PHOSPHO" 

CK2_PHOSPHO" 

CK2_PHOSPHO" 

CK2_PHOSPHO~ 

CK2_PH0SPH0" 

TYR_PHOSPHO~ 

TYR_PHOSPHO" 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 



PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKF2phfbr2__2cl8 . 2) 
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DKFZphfbr2_2dl5 



group: differentiation/development 

DKFZphfbr2_2dl5 encodes a novel 4 38 amino acid protein similarity to Mus musculus testis- 
specific Y-encoded-like protein (Tspyll) . 

The TSPY genes are arranged in clusters on the Y chromosome of many mammalian species. TSPY 
believed to function in early spermatogenesis and is a candidate for GBY, the putative 
gonadoblastoma-inducing gene on the Y . The novel protein is a new member of the TSPY-SET- 
NAP1L1 family, which represents proteins closely related to TSPY. Therefore, the new protein 
seems to be involved in early spermatogenesis. 

The new protein can find application in modulating early spermatogenesis. 



strong similarity to testis-specific Y-encoded-like protein 

complete cDNA, complete cds, EST hits 
localisation: primer B does not match perfect 

Sequenced by Qiagen 

Locus: /map ss "729.2 cR from top of Chr6 linkage group" 
Insert length: 3229 bp 

Poly A stretch at pos. 3206, polyadenylation signal at pos. 3184 



1 GGAGACTGTA GGGTGGGCGG 
51 TCTGAGGAAA ACGGGCGTTC 
101 GAGCGGCCTG GATGGGGTCA 
151 TCATTATTTC TGACCAAGTC 
201 AGGCTCCGCG ACCAAAGCGA 
251 GGGAGGCTCG GAGACCGTCG 
301 GGGGCGTACC CCAGGATCCC 
351 GTTGTTGGGG GTCGCGGTCA 
401 CCAGCCTCCC GCCGAAGGCC 
451 ACCGCAGCCT GAAAAAGGGC 
501 TGTGGCGCCC AGAGATCCGC 
551 GGCGGAGGAG GTGAAGACAG 
601 CTGAGAGGGA GAGCGCTGAG 
651 GAGGTAATGG AGGAGCAGAT 
701 AGAAATAGAA GTGGCGGAGG 
751 AAGAAGGGCC CTGGCCTTTG 
601 GCCATCCAGC TGGAACTGGA 
851 CCAACAGCTG GAGCACAAGT 
901 GGAGGAACTA CATCATTCAG 
951 CGAAACCACC CCCAGTTGTC 
1001 GTTAAGGTAC ATAACCAATT 
1051 CCGGTTGCAA GTTCAAGTTC 
1101 AAGCTGATTG TCAAGGAATA 
1151 TCTTTCTACT CCAATTATAT 
1201 TTCGCAGAAA CCAAGACCTC 
1251 CACAGCCTTC CAGAGTCCGA 
1301 GTGGCCAAAT CCACTGCAAT 
1351 CCCGACGTCG CCCGCTAAGG 
1401 TTCCAGTCTG GTTAACATTT 
1451 CTACCACCTT CTGCTGGACC 
1501 TCTATTGTGC TTTGTTTTTG 
1551 TATTCAGTTC TCTCAACCTC 
1601 CACTTCCATA TGACCTTCAT 
1651 CATCCTTCAC ACTACTTGTA 
1701 GCCTTTATCT GCACTGCTTG 
1751 GGTTGCTGTC ACTTGGATTT 
1801 GCTCTGCATT GAGCAGTATG 
1851 TTAATGAACT CAGAGGAGAA 
1901 ATGGTACTTC ATTGCTCTTC 
1951 TGCCCTACAT TGGCTCCTGC 
2001 XTTTTTTTTT TTTTTTTTTT 
2051 TCGCCCAGGT TGGAGTGCAG 
2101 CCTCCCGGGT TCAAGCGATT 
2151 CTACAGGCGC GCGCCGCCAC 
2201 ACGGGGTTTC ACCATGCTGG 
2251 TCCGCCCTCC TTAGCCTCCC 
2301 AATATTTGTA AAAGCAAGGT 
2351 GAGGCAAAGA AGTTGGCCTG 
2401 TCCCCTTCTT CCCAACTTCC 
2451 ATAGTTAAAG AG AG AC AC AT 



TGCGAGCGGC GGTTAGCTCC CAGTTCGGCC 
GCCTGCGGTT GGTCCGACTG TTAGCAACAT 
AGAGGACCAC TCCCCTCCAA ACCCACAGCA 
CCGAGCGACC AGGACGCACA CCAGTACCTG 
GGCGACACAG GTGATGGCGG AGCCGGGTGA 
CGCTCCCGCC TTCACCGCCT TCAGAGGAGG 
GCGGGCCGTG GCGGTACTCC CCAGATCCGA 
TGTGGCGATC AAAGCCGGGC AGGAAGAGGG 
TGGCAGCCGC TTCTGTGGTG ATGGCAGCCG 
GTTCAGGGTG GAGAGAAGGC CCTAGAAATC 
GTCTGAGCTG ACGGCGGGGG CGGAGGCTGA 
GAAAGTGCGC CACCGTCTCA GCAGCCGTGG 
GTGGTGGTGA AGGAAGGCCT GGCGGAGAAG 
GGAGGTAGAG GAGCAGCCGC CAGAAGGTGA 
AGGATAGATT GGAGGAGGAG GCGAGGGAGG 
CATGAGGCTC TCCGCATGGA CCCTCTGGAG 
CACTGTGAAT GCTCAGGCCG ACAGGGCCTT 
TTGGGCGGAT GCGTCGACAC TACCTGGAGC 
AATATCCCGG GCTTCTGGAT GACTGCTTTT 
CGCCATGATT AGGGGCCAAG ATGCAGAGAT 
TAGAGGTGAA GGAACTCAGA CACCCTAGAA 
TTCTTTAGAA GAAACCCCTA CTTCAGAAAC 
TGAGGTAAGA TCCTCCGGCC GAGTGGTGTC 
GGCGCAGGGG GCATGAACCC CAGTCCTTCA 
ATCTGCAGCT TCTTCACTTG GTTTTCAGAC 
CAAAATTGCT GAGATTATTA AAGAGGATCT 
ACTACCTGTT GCGTGAAGGA GTCCGTAGAG 
GAGCCTGTAG AGATCCCCAG GCCCTTTGGG 
GCCCTTGGGA ATACTCCTGC ACAAGGTCTC 
TGTGCTTGGG CATCAGCAAT GAGTATGCCT 
CTGACTTTTC TGCACCCTGT TTCCTTTGGA 
AAG ATT GAGA CGGTGGTGGG TATGCTTCTC 
GCTGTTCTGG AATATCACAT GCTACGAGGT 
AGCCAAGCAA ATGATACTGT AGATTGTACT 
GACCCTGTTT ATTCCCAGGG CCTCTGAACT 
CTAGCTTTGG GAGCCTGTTC CACCTACTCA 
GGCACATGCC CTGTGGACAG TTACTGGACG 
AAGCAGTGAG CCACTTGTTC TGTGTGATTT 
CTTCACCTCT AGTCACTTTC TATTGCTACC 
CAAGGTCCCT CTCTCTCCCT GTTTTCCTTT 
TTTTGAGACG GAGGACGGAG TCTTGCTCTG 
TGGCGCGATC TCGGCTCACT GCAACCTCCA 
CTCCTGCCTC AGCCTCCCGA GTAGCTGGGA 
GCCCGGCTAA TTTTTATATT TTTAGTAGAG 
CCAGGCTGGT CTCGAACCCC GACCTCGTGA 
AATCCTCTCT TAAAAAAGTG ATAGCTCAGA 
TTTTATTTCA TTTTGGCTCT GTCATTTTCA 
TAAAATAGAG TGCTAGAGCT CTTACGCCCC 
TACTTCCTAG CCCTTTTATC AACTCCTAGA 
CTAGATGGGA TGAAAGGTGC CCTAAGCAGG 
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2501 AGAAACTGAA CAAAAGGCTA GAGGCATGGG CCAGGTAAAA ATTGGGCCTA 
2551 GAGTGAAGAC TGTGCTGCCG TTAAGAGCTT TCGAGGAAGG AGTACTTACT 
2601 CCCCAATGAT GATGAATGGA GAAATACTTT TCAGGGAGAA TTGAAGGGGT 
2651 TAAAGTGTTA AATATGTTGC CTAGACAAGG GTTCTTTAAA GAAAGACAGC 
2701 GCAACTTTGA ATGCTTTCTT ACTTGTTTTG TGACCTAATT TATGTGGAAG 
2751 ATTGTTATTT CATTAGGATT TAGTAAAATT TTTTTTTCTG ATTCTAAACT 
2801 TATTGTGAAA ATTGAGCTGT AC AG AT ATT C TTTTGATTTC AATTGGGAAC 
2851 ATTTGGAAGA ACAACAGTCT TACTTGCCTG TACAATATAG AGACATATGA 
2901 ATAGTCATAA CAGTTTTCAA CTTGTTCTTG TTTCTGTTAA ACTATATTCC 
2951 TAGAAACATA GTTTGAACAA CTTGGTCTTT GTTAGGCTTG TCAAATTGCC 
3001 TTCATGGAAA AATAATCTAC AAAAGTATGG TTTAATTGAT TGTCTTACAT 
3051 GATAATTTTC CCTGGCAACA ACTTAGTAAG TGATATATCT TTTTTCCTAA 
3101 ATTGCTTAAA TACTGTGAAA TTGCTCTGAC AAATTGGAAG TGTACCATTG 
3151 GCATATTTGT CTTCCTTTTT ATGCATGATG GTAAAATAAA AGCATGTTGT 
3201 TCTGCTAAGA AAAAAAAAAA AAAAAAAAA 



BLAST Results 



Entry AF042181 from database EMBLNEW: 

Homo sapiens testis-specif ic Y-encoded-like protein (TSPYL) mRNA, 
partial cds. 

Score = 3411, P « 6.9e-148, identities 685/687 

Entry HS93834 3 from database EMBL: 
human STS WI-11947. 
Score = 1195, P = 2.1e-46, identities « 273/299 



Medline entries 



98399864: 

Murine and human TSPYL genes: novel members of the TSPY-SET-NAPlLl family 



Peptide information for frame 3 



ORF from 99 bp to 1412 bp; peptide length: 438 
Category: strong similarity to known protein 
Classification : Differentiation/Development 

1 MSGLDGVKRT TPLQTHSIII SDQVPSDQDA HQYLRLRDQS EATQVMAEPG 
51 EGGSETVALP PSPPSEEGGV PQDPAGRGGT PQIRVVGGRG HVAIKAGQEE 
101 GQPPAEGLAA ASVVMAADRS LKKGVQGGEK ALEICGAQRS ASELTAGAEA 
151 EAEEVKTGKC ATVSAAVAER ESAEVVVKEG LAEKEVMEEQ MEVEEQPPEG 
201 EEIEVAEEDR LEEEAREEEG PWPLHEALRM DPLEAIQLEL DTVNAQADRA 
251 FQQLEHKFGR MRRHYLERRN YIIQNIPGFW MTAFRNHPQL SAMIRGQDAE 
301 MLRYITNLEV KELRHPRTGC KFKFFFRRNP YFRNKLIVKE YEVRSSGRVV 
351 SLSTPIIWRR GHEPQSFIRR NQDLICSFFT WFSDHSLPES DKIAEIIKED 
401 LWPNPLQYYL LREGVRRARR RPLREPVEIP RPFGFQSG 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2dl5, frame 3 

TREMBL:AF042180_1 gene: "Tspyll"; product: "testis-specif ic 
Y-encoded-like protein"; Mus musculus testis-specif ic Y-encoded-like 
protein (Tspyll) mRNA, complete cds., N * 1, Score « 1202, P - 3.1e-122 

TREMBL: AB018264 1 gene: "KIAA0721"; product: "KIAA0721 protein"; Homo 
sapiens mRNA for KIAA0721 protein, partial cds., N « 1, Score = 798, P 
= 2e-79 

TREMBL :AB015345_1 gene: "HRIHFB2216"; Homo sapiens HRIHFB2216 mRNA, 
partial cds., N = 1, Score = 570, P - 2.9e-55 



>TREMBL:AF042180 1 gene: "Tspyll"; product: "testis-specif ic Y-encoded-like 
protein"; Mus musculus testis-specif ic Y-encoded-like protein (Tspyll) 
mRNA, complete cds. 

Length = 379 

HSPs: 
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Score - 1202 U80.3 bits), Expect - 3.1e-122, P » 3.1e-122 
Identities = 258/377 (68%), Positives « 283/377 (75%) 



Query: 62 SPPSEEGGVPQDPAGR GGTPQIRVVGGRGHVAIKAGQEE — GQP-P — AEGLAA 110 

SP +EG D G GTP R + G G+ GPP EGL 

Sbjct: 3 SPERDEGTPVPDSRGHCDADTVSGTPDRRPLLGEEKAVTGEGRAGIVGSPAPRDVEGLVP 62 

Query: 111 ASVVMAADRSLKK-GVQGGEKALEICGAQRSASELTAGAEAEAEEVKTGKCATVSAAVAE 169 

V AA + V+G A+ + ++ T GAE++A +VKT + TV+AA 

Sbjct: 63 QIRVAAARQGESPPSVRGPAAAVFVTPKYVEKAQETRGAESQARDVKT-EPGTVAAAA-- 119 

Query: 170 RESAEVVVKEGLAEKEVMEEQMEVEEQPPEGEEIEVAEEDRLEEEAREEEGPWPLHEALR 229 

E +EV EE MEVE Q P GEE+E+ E EA EE GPW L LR 

Sbjct: 120 -EKSEVATPGS EEVMEVE-QKPAGEEMEMLEASGGVREAPEEAGPWHLGIDLR 170 

Query: 230 MDPLEAI QLELDT VNAQADRAFQQLEHKFGRMRRH YLERRN Y I I QN I PGFWMTAFRNHPQ 289 

+PLEAIQLELDTVNAQADRAFQ LE KFGRMRRH YLERRN Y I IQNI PGFWMTAFRNHPQ 
Sbjct: 171 RNPLEAI QLELDT VNAQADRAFQHLEQKFGRMRRH YLERRN YI IQNI PGFWMTAFRNHPQ 230 

Query: 290 LSAMIRGQDAEMLRYITNLEVKELRHPRTGCKFKFFFRRNPYFRNKLIVKEYEVRSSGRV 349 

LSAMIRG+DAEMLRY+T+LEVKELRHP+TGCKFKFFFRRNPYFRNKLIVKEYEVRSSGRV 
Sbjct: 231 LSAMIRGRDAEMLRYVTSLEVKELRHPKTGCKFKFFFRRNPYFRNKLIVKEYEVRSSGRV 290 

Query: 350 VSLSTPIIWRRGHEPQSFIRRNQDLICSFFTWFSDHSLPESDKIAEIIKEDLWPNPLQYY 409 

VSLSTPIIWRRGHEPQSFIRRNQDLICSFFTWFSDHSLPESD+IAEIIKEDLWPNPLQYY 
Sbjct: 291 VSLSTPIIWRRGHEPQSFIRRNQDLICSFFTWFSDHSLPESDRIAEIIKEDLWPNPLQYY 350 

Query: 410 LLREGVRRARRRPLREPVEI PRPFGFQSG 438 

L REG+RR RRRP+REPVEI PRPFGFQSG 
Sbjct: 351 LCREGIRRPRRRPIREPVEIPRPFGFQSG 379 

Pedant information for DKFZphfbr2_2dl5, frame 3 



Report for DKFZphfbr2_2dl5 . 3 

[LENGTH] 438 

[MW] 49307.65 

[pi] 5.36 

[HOMOL] TREMBL:AF042180_1 gene: "Tspyll"; product: "testis-specif ic Y-encoded-like 

protein"; Mus musculus testis-specific Y-encoded-like protein (Tspyll) mRNA, complete cds. le- 
107 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YKR048c] le-07 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YKR048c] le-07 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YKR048c] 
le-07 

[FUNCAT] 09.13 biogenesis of chromosome structure [S. cerevisiae, YKR048c] le-07 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YKR048c] le-07 

[BLOCKS] BL00376F 

[PIRKW] nucleus 6e-39 

[PIRKW] DNA binding 3e-06 

[PIRKW] phosphoprotein 6e-39 

[PIRKW] alternative splicing 6e-39 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 22.83 % 

SEQ MSGLDGVKRTTPLQTHSIIISDQVPSDQDAHQYLRLRDQSEATQVMAEPGEGGSETVALP 
SEG x 



PRD ccccccccccccccceeeeecccccccccchhhhhhhhchhhhhcccccccccceeeecc 

SEQ PSPPSEEGGVPQDPAGRGGTPQIRWGGRGHVAIKAGQEEGQPPAEGLAAASVVMAADRS 

SEG xxxxxxxxx 

PRD ccccccccccccccccccccceeeeecccceeeeecccccccccchhhhhhhhhhhhhcc 

SEQ LKKGVQGGEKALEICGAQRSASELTAGAEAEAEEVKTGKCATVSAAVAERESAEVVVKEG 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxx . 

PRD ccccccccccceeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ LAEKEVMEEQMEVEEQPPEGEEIEVAEEDRLEEEAREEEGPWPLHEALRMDPLEAIQLEL 

SEG . xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhh 

SEQ DT VNAQADRAFQQLEHKFGRMRRH YLERRN Y I IQN I PGFWMTAFRNHPQLS AMI RGQDAE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccceeeeecccccccccccccchhh 

SEQ MLRYITNLEVKELRHPRTGCKFKFFFRRNPYFRNKLIVKEYEVRSSGRVVSLSTPI IWRR 
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SEG 

PRD hhhhhhhhhhhhhcccccceeeeeeeccccccchhhhhhccccccccccccccceeeecc 

SEQ GHEPQSFIRRNQDLICSFFTWFSDHSLPESDKIAEIIKEDLWPNPLQYYLLREGVRRARR 

SEG xxxxxxxxxxx 

PRD ccccchhhhhhcccccceeeeeccccccccchhhhhhhhhcccccceeeeccccchhhhh 

SEQ RPLREPVEI PRPFGFQSG 

SEG xxxxxxxx 

PRD hccccccccccccccccc 

(No Prosite data available for DKFZphfbr2_2dl5 . 3 ) 
(No Pfam data available for DKFZphfbr2_2dl5. 3) 
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DKFZphfbr2_2dl7 



group: transmembrane proteins 

DKFZphfbr2_2dl7 encodes a novel 292 amino acid protein with similarity to a C.elegans 
hypothetical protein. 

One transmembrane region is predicted for the protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



similarity to C.elegans hypothetical protein 

TRANSMEMBRANE 1 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 1009 bp 

Poly A stretch at pos. 990, polyadenylation signal at pos. 969 



1 TGGGCCTGTG GCTGGGGGCA GAGCTCAGAC TGTCTTCTGA AGATTGATGT 

51 CTATTTCCTT GAGCTCTTTA ATTTTGTTGC CAATTTGGAT AAACATGGCA 

101 CAAATCCAGC AGGGAGGTCC AGATGAAAAA GAAAAGACTA CCGCACTGAA 

151 AGATTTATTA TCTAGGATAG ATTTGGATGA ACTAATGAAA AAAGATGAAC 

201 CGCCTCTTGA TTTTCCTGAT ACCCTGGAAG GATTTGAATA TGCTTTTAAT 

251 GAAAAGGGAC AGTTAAGACA CATAAAAACT GGGGAACCAT TTGTTTTTAA 

301 CTACCGGGAA GAT TT AC AC A GATGGAACCA GAAAAGATAC GAGGCTCTAG 

351 GAG AG AT CAT CACGAAGTAT GTATATGAGC TCCTGGAAAA GGATTGTAAT 

401 TTGAAAAAAG TATCTATTCC AGTAGATGCC ACTGAGAGTG AACCAAAGAG 

451 TTTTATCTTT ATGAGTGAGG ATGCTTTGAC AAATCCACAG AAACTGATGG 

501 TTTTAATTCA TGGTAGTGGT GTTGTCAGGG CAGGGCAGTG GGCTAGAAGA 

551 CTTATTATAA ATGAAGATCT GGACAGTGGC AC AC AG AT AC CGTTTATTAA 

601 AAGAGCTGTG GCTGAAGGAT ATGGAGTAAT AGTACTAAAT CCCAATGAAA 

651 ACTATATTGA AGTAGAAAAG CCGAAGATAC ACGTACAGTC ATCATCTGAT 

701 AGTTCAGATG AACCAGCAGA AAAACGGGAA AGAAAAGATA AAGTTTCTAA 

7 51 AGTAACAAAG AAGCGACGTG ATTTCTATGA GAAGTATCGT AACCCCCAAA 

801 GAGAAAAAGA AATGATGCAA TTGTATATCA GAGTGAGTGA GATCACTACT 

851 TTCCTTTACT ATTTTCTTTA CCTTGTATAT ATTTTATTAT ATGTAGATTG 

901 TTTTGTTTTT CTTCAAGAAT ATTAATTTCT TTATTTGTCA TCATTTATTT 

951 CCCATGGTCG TCTACTTGGA TTAAATGGGT TTTTAAATTC AAAAAAAAAA 

1001 AAAAAAAAA 



BLAST Results 



Entry 189937 from database EMBL: 

Sequence 11 from patent US 5723315. 

Score = 1083, P = 2.2e-42, identities « 223/231 

Entry 189938 from database EMBL: 

Sequence 12 from patent US 5723315. 

Score = 875, P = 7.4e-33, identities = 175/175 

\ 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 47 bp to 922 bp; peptide length: 292 
Category: similarity to unknown protein 
Classification: unset 



1 MSISLSSLIL LPIWINMAQI QQGGPDEKEK TTALKDLLSR IDLDELMKKD 



221 



WO 01/12659 



PCT/EB00/01496 



51 EPPLDFPDTL EGFEYAFNEK GQLRHIKTGE PFVFNYREDL HRWNQKRYEA 
101 LGEIITKYVY ELLEKDCNLK KVSIPVDATE SEPKSFIFMS EDALTNPQKL 
151 MVLIHGSGVV RAGQWARRLI INEDLDSGTQ IPFIKRAVAE GYGVIVLNPN 
201 ENYIEVEKPK IHVQSSSDSS DEPAEKRERK DKVSKVTKKR RDFYEKYRNP 
251 QREKEMMQLY IRVSEITTFL YYFLYLVYIL LYVDCFVFLQ EY 

BLASTP hits 

Entry S67436 from database PIR: 

hypothetical protein - fission yeast (Schizosaccharomyces pombe) 
Length = 266 

Score = 112 (39.4 bits), Expect = 0.00037, P = 0.00037 
Identities = 33/147 (22%), Positives » 69/147 (46%) 

Entry CEY75B8A_12 from database TREMBLNEW: 

gene: "Y75B8A.31"; Caenorhabditis elegans cosmid Y75B8A 

Score - 327, P * 1.5e-29, identities - 72/140, positives = 93/140 



Alert BLASTP hits for DKFZphfbr2_2dl7, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_2dl7, frame 2 



Report for DKFZphfbr2_2dl7.2 



[LENGTH] 292 

[MWJ 34260.50 

[pi] 5.50 

[HOMOL] TREMBLNEW: AF0 6478 2_1 product: "unknown"; Mus musculus clone pEN87 unknown mRNA, 

partial cds. le-119 

[KWJ S I GN AL_P E PT I DE 19 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 10.96 % 

SEQ MSISLSSLILLPIWINMAQIQQGGPDEKEKTTALKDLLSRIDLDELMKKDEPPLDFPDTL 

SEG .xxxxxxxxxxxxxx 

PRD ccchhhhhhchhhhhhhccccccccccchhhhhhhhhhhhhcchhhhhhccccccccccc 

MEM 

SEQ EGFEYAFNEKGQLRHIKTGEPFVFNYREDLHRWNQKRYEALGEIITKYVYELLEKDCNLK 

SEG 

PRD hhhhhhcccccceeeecccccceeecccccccchhhhhhhhhhhhhhhhhhhhhhhhhhe 

MEM 

SEQ KVSIPVDATESEPKSFIFMSEDALTNPQKLMVLIHGSGVVRAGQWARRLI INEDLDSGTQ 

SEG 

PRD eeeccccccccccceeeeeeccccccccceeeeeecccccchhhhhcccccccccccccc 

MEM 

SEQ IPFIKRAVAEGYGVIVLNPNENYIEVEKPKIHVQSSSDSSDEPAEKRERKDKVSKVTKKR 

SEG 

PRD chhhhhhhhccceeeeeccccceeeeeccceeeeccccccccchhhhhhhhhhhhhhhhh 

MEM 

SEQ RDFYEKYRNPQREKEMMQLYIRVSEITTFLYYFLYLVYILLYVDCFVFLQEY 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhcccchhhhhhhhhhhhheeeeehhhhhhhhhhhhheeeeeeeccc 

MEM MMMMMMMMMMMMMMMMMMMMM 



(No Prosite data available for DKFZphfbr2_2dl7 .2) 
(No Pfam data available for DKFZphfbr2_2dl7 .2) 
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DKFZphfbr2_2d20 



group: brain derived 

DKFZphfbr2_2d20 encodes a novel 197 amino acid protein with similarity to Synechocystis sp. 
P74594 hypothetical32.8 kD protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

similarity to Synechocystis sp. (PCC 6803) 
complete cDNA, complete cds, EST hits 

potential start at bp 67 matches kozak consensus ANCatgG 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1787 bp 

Poly A stretch at pos . 1768, polyadenylation signal at pos. 1743 

1 TGGGGCGGCC GCGGCGGGAA CATGGAGGAG CTGCTGAGGC GCGAGCTGGG 
51 CTGCAGCTCT GTCAGGGCCA CGGGCCACTC GGGGGGCGGG TGCATCAGCC 

101 AGGGCCGGAG CTACGACACG GATCAAGGAC GAGTGTTCGT GAAAGTGAAC \ 

151 CCCAAGGCGG AGGCCAGAAG AATGTTTGAA GGTGAGATGG CAAGTTTAAC 

201 TGCCATCCTG AAAACAAACA CGGTGAAAGT GCCCAAGCCC ATCAAGGTTC 

251 TGGATGCCCC AGGCGGCGGG AGCGTGCTGG TGATGGAGCA CATGGACATG 

301 AGGCATCTGA GCAGTCATGC TGCAAAGCTT GGAGCCCAGC TGGCCGATTT 

351 ACACCTTGAT AACAAGAAGC TTGGAGAGAT GCGCCTGAAG GAGGCGGGCA 

401 CAGTGTGGAG AGGAGGTGGG CAGGAGGAAC GGCCCTTTGT GGCCCGGTTT 

451 GGATTTGACG TGGTGACGTG CTGTGGATAC CTCCCCCAGG TGAATGACTG 

501 GCAGGAGGAC TGGGTCGTGT TCTATGCCCG GCAGCGCATT CAGCCCCAGA 

551 TGGACATGGT GGAGAAGGAG TCTGGGGACA GGGAGGCCCT CCAGCTTTGG 

601 TCTGCTCTGC AGTAAAAGAT CCCTGACCTG TTCCGTGACC TGGAGATCAT 

651 CCCAGCCTTA CTCCACGGGG ACCTCTGGGG TGGAAACGTA GCAGAGGATT 

701 CCTCTGGGCC GGTGATTTTT GACCCAGCTT CTTTCTACGG CCACTCGGAA 

751 TATGAGCTGG CAATAGCTGG CATGTTTGGG GGCTTTAGCA GCTCCTTTTA 

801 CTCCGCCTAC CACGGCAAAA TCCCCAAGGC CCCAGGATTC GAGAAGCGCC 

851 TTCAGTTGTA TCAGCTCTTT CACTACTTGA ACCACTGGAA TCATTTTGGA 

901 TCGGGGTACA GAGGATCCTC CCTGAACATC ATGAGGAATC TGGTCAAGTG 

951 AGCGGGCCTT ACTCTGGAAG GAGGTCTCAG AGGTTTCTCC ACAGTCCTCT 
1001 TCTGGGCAAA TTCTTGTTTC TTCACATGCC GGACTAGCTT AAGACCAATG 
1051 CAGTAGCTTA TTTCCAAGCC TTGCAAAGTA TATAATATCT AAGAGGAAAG 
1101 GTTTTGTCAT CCCAGCGTTG TCCACTTTGT GGGGCTTTGT AGGTAGACGG 
1151 AGCCACACTA CAGGCAGGGT ATGAGCAGAG GGATGTATGG AGTGTGGGCG 
1201 ACTCTGAGCC TCACTGCTGC TGCAAGGTGG GGAAACTGTA AGTGAACCCC 
1251 TGTGGGTGCG GGGGAGGGTA TCCGGTGCGC AGGGAGGTGG CCAGCGCCCC 
1301 CGGGCACTGC TGCTCATAGG TACCTTTCCG CTGCCTCCTC CCTGCTCTCC 
1351 TGTGCAGGAA TGTCTCTGAG CTGTTCACGT TGATGCTTCT TGGTTGGCAA 
1401 GACTTGGGTG TAGACATGAA ACCACCTTAC TAAAAGCGTC TTAAAATGAC 
1451 CAATTCCAGA ATCAAGCGTA TTCCGTTTTC CTCCTGCATG ATCCCTGGGC 
1501 CCTCCCGCAG GCTGAGCAAG TCTGTAAACT GATTCTGGGA GAAACCAAGC 
1551 TGCTGGCCGT AGGATGTCCT TGGGTACATC CAGGAGTCTT CATTGCTTCT 
1601 GTTATTACCC CGTCTCCTCT GCCATTTTCT ACAGCTTGCT GAGTTGTCAT 
1651 TCCTTTGCAA CATTAAAATA CATGCTGAAC TCATATTTTT CCTTCCTTCA 
1701 CTGTTGTAGT AAAGAGACAT ATTTCATGAA TGGCATTGAT GCTAATAAAC 
1751 CCTTTGCCCA AAAATTTGAA AAAAAAAAAA AAAAAAA 

f ' 

BLAST Results 



No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 1 
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ORF from 22 bp to 612 bp; peptide length: 197 
Category: similarity to unknown protein 
Prosite motifs: LEUCINE_ZIPPER (117-139) 



1 MEELLRRELG CSSVRATGHS GGGCISQGRS YDTDQGRVFV KVNPKAEARR 
51 MFEGEMASLT AILKTNTVKV PKPIKVLDAP GGGSVLVMEH MDMRHLSSHA 
101 AKLGAQLADL HLDNKKLGEM RLKEAGTVWR GGGQEERPFV ARFGFDVVTC 
151 CGYLPQVNDW QEDWVVFYAR QRIQPQMDMV EKESGDREAL QLWSALQ 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2d20 / frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_2d20, frame 1 



Report for DKFZphfbr2_2d20 . 1 



[LENGTH J 197 

[MW] 21963.25 

Epl] 6.96 

[HOMOL] PIR:S76790 hypothetical protein - Synechocystis sp. (strain FCC 6803) 9e-12 

[SUPFAM] hypothetical protein bl725 le-06 

[PROSITE] LEUCINE_ZIPPER 1 

[PROSITE] MYRISTYL 2 

[ PROSITE] GLYCOSAMINOGLYCAN 1 

[PROSITE] PKC_PHOSPHO_SITE 2 

[KW] Alpha_Beta 



SEQ MEELLRRELGCSSVRATGHSGGGCISQGRSYDTDQGRVFVKVNPKAEARRMFEGEMASLT 

PRD ccchhhhhccccceeeeccccccceeeccccccccceeeeeeccchhhhhhhhhhhhhhh 

SEQ AI LKTNTVKVPKPIKVLDAPGGGSVLVMEHMDMRHLSSHAAKLGAQLADLHLDNKKLGEM 

PRD hhhhhheeeeccceeeecccccceeeeecccccccchhhhhhhhhhhhhhhcccccchhh 

SEQ RLKEAGTVWRGGGQEERPFVARFGFDVVTCCGYLPQVNDWQEDWWFYARQRIQPQMDMV 

PRD hhhhhccccccccccccceeeccccceeeccccccccccccchhhhhhhhhhhhhhhhhh 

SEQ EKESGDREALQLWSALQ 

PRD hhhccchhhhhhhhccc 



Prosite for DKFZphfbr2_2d20 . 1 



PS00002 
PS00005 
PS00005 
PS00008 
PS00008 
PS00029 



20->24 
13->16 
67->70 
22->28 
104->110 
96->118 



GLYCOSAMINOGLYCAN 
PKC_PHOSPHO_SITE 
PKC PHOSPHO_SITE 
MYRISTYL 
MYRISTYL 
LEUCINE ZIPPER 



PDOC00002 
PDOC00005 
PDOC00005 
PDOC00008 
PDOC00008 
PDOC00029 



(No Pfam data available for DKFZphfbr2_2d20 . 1) 
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DKFZphfbr2_2gl8 



group: brain derived 

DKFZphfbr2_2gl8 encodes a novel 229 amino acid protein with partial similarity to the humane 
dJ30M3.2 gene product. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 

J30M3.2 extension of genmodel 

complete cDNA, complete cds, EST hits 
(mouse ESTs with >90% Identities) 

Sequenced by Qiagen 

Locus: /map="'6p22.1-22 M 

Insert length: 2444 bp 

Poly A stretch at pos. 2425, no polyadenylation signal found 

1 TGGTCGAGGG TCGACGGTAT CGATAAGTTT TTTTTTTTTT TTTTTTTTTT 

51 TGGAAAGCAA GGATCACACT TCCCCCTCCC TGTTCCTTAA TCCCTTTTCT 

101 AAAAAGGGGG GAAAATCCGG ATGGATTTTA GGGATTGGTC TGGTGTCAGC 

151 TGTGTCTTAT TGCACACCTA AATCCTGATT ATAGGCTTTT CATTTCTCCG 

201 CAAAGCCTTT ATTTTGGCAG TTAAGCCAAA TGTGTTTTCC AGAAAGTTAG 

251 TTATTTTCTC CTCTTTCTTT CCTTTCTTTC CTCCCTTTTT CCCGTCTGAC 

301 CCCAAACGTT ATTGTCCAAA CATGACTGGA CAGCAGCTTT TGTTTCTTGA 

351 CCCTGTAATA TGACAGTCTG CTAATATTGA CAGAAGGTGC AGTTTTTGGG 

401 TTATAGTCGT GATTTTCGCT AATCAATCAT ATTAGCAGGA AAAAAAATGA 

4 51 CTTGTTTCTG TTGTACTTGA GTCTTAAGAA AAAGTGCCCA TAGTTTAGTG 

501 ACAATTTCCA AAGGCTTTAG TACCACCTGT ATTTCAAAAT GGGGGACCCA 

551 AACTCCCGGA AGAAACAAGC TCTGAACAGA CTACGTGCTC AGCTTAGAAA 

601 GAAAAAAGAA TCTCTAGCTG ACCAGTTTGA CTTCAAGATG TATATTGCCT 

651 TTGTATTCAA GGAGAAGAAG AAAAAGTCAG CACTTTTTGA AGTGTCTGAG 

701 GTTATACCAG TCATGACAAA TAATTATGAA GAAAATATCC TGAAAGGTGT 

751 GCGAGATTCC AGCTATTCCT TGGAAAGTTC ' CCTAGAGCTT TTACAGAAGG 

801 ATGTGGTACA GCTCCATGCT CCTCGATATC AGTCTATGAG AAGGGATGTA 

851 ATTGGCTGTA CTCAGGAGAT GGATTTCATT CTTTGGCCTC GGAATGATAT 

901 TGAAAAAATC GTCTGTCTCC TGTTTTCTAG GTGGAAAGAA TCTGATGAGC 

951 CTTTTAGGCC TGTTCAGGCC AAATTTGAGT TTCATCATGG TGACTATGAA 

1001 AAACAGTTTC TGCATGTACT GAGCCGCAAG GACAAGACTG GAATCGTTGT 

1051 CAACAATCCT AACCAGTCAG TGTTTCTCTT CATTGACAGA CAGCACTTGC 

1101 AGACTCCAAA AAACAAAGCT ACAATCTTCA AGTTATGCAG CATCTGCCTC 

1151 TACCTGCCAC AGGAACAGCT CACCCACTGG GCAGTTGGCA CCATAGAGGA 

1201 TCACCTCCGT CCTTATATGC CAGAGTAGAG TACTGACCAG CAAAATGGAG 

1251 AAGATCAGAG AATGCAGCAG CAGTTTTTTT TCTTGTTTTC TTACCACTTT 

1301 ATTCTTTCAG AGTTTAAAGA AAATGGACTC ATGCACAGAA CACTATGCAT 

1351 TTTGAAACTT GTTCATCCTG GATTTTTTTA AATCATTTTT ATCTCAGAAC 

1401 TTAAACAAAA ATTAGATGTC GTGCACGGAC TGTGTGAAAG AAGATGCTTT 

1451 GCATATTTGC TGCACTGCAT CAGTATCTTA CTAAAAATGT GAAATGAAAG 

1501 GACTATTGTA CACTGAAATG CTTAAATGTA TCTGAAAGCA CAAGGTGATA 

1551 CTCATTTTTA TGGTCTTCCC ATTTGTGCTG GTTTTTGCCT CTTTGACATC 

1601 TGTCATCAGT ATTTAGAGGG TGAGAAGTGA ATGTAACAGG TATAAATAAC 

1651 ATTTTTAAAA ACAATAACTT TGCTATAATC ACAGTTGTTC CAGAGCACTG 

1701 TCAGATACAT TCTAATGACC AGAACTGGTT TAAAAAAAGA AAATACAACC 

1751 ATGGGAAAGA AATCTTAAAT GAAAAACGCA TCTCATTGTA GGCATTTTTG 

1801 CCTCATATTT TACTGGGCCA TGTTTGTTTC CTGGTACTCA TGTATTTTTT 

1851 TTTTTTCCAG ATCTCTTTCC CCAAGTTGCT ATTGTAAGAG TATTCTGCTG 

1901 CGTGTGGATG CAGTTATACA CATTAAAGCA GATCTGGAGT CTGAAGTAGC 

1951 TATAAAGCAG CTATAAAACA GAAATACATG CATAGCTGCA GAAACCATGA 

2001 TAGGTAGAGG ACTTTTCTTT TGGTTTTGTT TTGTTTTGTT TTGTTTTGTT 

2051 TTTGGTTTTA CAGAGAAGAG ATTTTTATTA CAAAGAAAAA AATTCCAGTG 

2101 AATTGTGCAG AAATGCTGGT TTTTACACCA TCCTAAAGAA AAACTTTACA 

2151 AGGGTGTTTT GGAGTAGAAA AAAGGTTATA AAGTTGGAAT CTTAAATTGT 

2201 AAAATTAACC ATTGAGTGTC AAAGTTCTAA AAGCAGAACT CATTTCGTGC 

2251 AATGAACATA AGGAAAGACT ACTGTATAGG TTTTTTTTTT TCTCCTTTTA 

2301 AATGAAGAAA AGCTTTGCTT AAGGGTTGCA TACTTTTATT GGAGTAAATC 

2351 TGAATGATCC TACTCCTTTG GAGTAAGACT AGTGCTTACC AGTTTCCAAT 

2401 TGTATTTAGC TTCTGTTGGA ATTTGAAAAA AAAAAAAAAA AAAA 



BLAST Results 
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Entry HS338352 from database EMBL: 
human STS EST171398. 
Score * 1747, P = 3.0e-74, identities = 359/365 

Entry HS447255 from database EMBL: 
human STS SHGC-10143. 
Score = 1717, P = 6.5e-73, identities - 365/383 

Entry HS30M3 from database EMBLNEW: 

Human DNA sequence from clone 30M3 on chromosome 6p22.1-22.3. Contains 
three novel genes, one similar to C. elegans Y63D3A.4 and one similar 
to (predicted) plant, worm, yeast and archaea bacterial genes, and the 
first exon of the KIAA0319 gene. Contains ESTs, GSSs and putative CpG 
islands. 

Score = 6646, P « 0.0e+00, identities = 1344/1355 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 539 bp to 1225 bp; peptide length: 229 
Category: putative protein 



1 MGDPNSRKKQ ALNRLRAQLR KKKESLADQF DFKMYIAFVF KEKKKKSALF 

51 EVSEVIPVMT NNYEENILKG VRDSSYSLES SLELLQKDW QLHAPRYQSM 

101 RRDVIGCTQE MDFILWPRND IEKIVCLLFS RWKESDEPFR PVQAKFEFHH 

151 GDYEKQFLHV LSRKDKTGIV VNNPNQSVFL FIDRQHLQTP KNKATIFKLC 

201 SICLYLPQEQ LTHWAVGTIE DHLRPYMPE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2gl8, frame 2 

TREMBLNEW : HS30M3__2 gene: n dJ30M3.2 w ; product: "dJ30M3.2 (novel 
protein)"; Human" DNA sequence from clone 30M3 on chromosome 
6p22. 1-22.3. Contains three novel genes, one similar to C. elegans 
Y63D3A.4 and one similar to (predicted) plant, worm, yeast and archaea 
bacterial genes, and the first exon of the KIAA0319 gene. Contains 
ESTs, GSSs and putative CpG islands., N = 1 , Score « 470, P = l.le-4 4 



>TREMBLNEW:HS30M3_2 gene: "dJ30M3 . 2" ; product: M dJ30M3.2 (novel protein)"; 
Human DNA sequence from clone 30M3 on chromosome 6p22. 1-22.3. Contains 
three novel genes, one similar to C. elegans Y63D3A.4 and one similar to 
(predicted) plant, worm, yeast and archaea bacterial genes, and the first 
exon of the KIAA0319 gene. Contains ESTs, GSSs and putative CpG islands. 
Length = 86 

HSPs: 

Score =* 470 (70.5 bits), Expect = l.le-44, P = l.le-44 
Identities « 86/86 (100%), Positives - 86/86 (100%) 

Query: 144 AKFEFHHGDYEKQFLHVLSRKDKTGIVVNNPNQSVFLFIDRQHLQTPKNKATIFKLCSIC 203 

AKFEFHHGDYEKQFLHVLSRKDKTGIVVNNPNQSVFLFIDRQHLQTPKNKATIFKLCSIC 
Sbjct: 1 AKFEFHHGDYEKQFLHVLSRKDKTGIVVNNPNQSVFLFIDRQHLQTPKNKATIFKLCSIC 60 

Query: 204 LYLPQEQLTHWAVGTIEDHLRPYMPE 229 

LYLPQEQLTHWAVGTIEDHLRPYMPE 
Sbjct: 61 LYLPQEQLTHWAVGTIEDHLRPYMPE 86 



Pedant information for DKFZphfbr2_2gl8, frame 2 



Report for DKFZphfbr2_2gl8 . 2 
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[LENGTH] 229 

[MW] 27083.42 

[pi] 9.04 

[HOMOL] TREMBL:HS30M3__2 gene: f, dJ30M3 . 2"; product: "dJ30M3.2 (novel protein)"; Human 



DNA sequence from clone 30M3 on chromosome 6p22. 1-22.3. Contains three novel genes, one 
similar to C. elegans Y63D3A.4 and one similar to (predicted) plant, worm, yeast and archaea 
bacterial genes, and the first exon of the KIAA0319 gene. Contains ESTs, GSSs and putative CpG 
islands. 6e-47 

[PROSITE] MYRISTYL 2 

[PROSITE] CAMP_PHOSPHO_SITE 2 

t PROSITE] CK2_PHOSPHO SITE 4 

[PROSITE] TYR_PHOSPHO~SITE 1 

[PROSITE] PKC PHOSPHO_SITE 4 

[PROSITE] ASN~GLYCOSYLATION 1 

[KW] Alpha_Beta 

[KW] LOW COMPLEXITY 5.24 % 



SEQ MGDPNSRKKQALNRLRAQLRKKKESLADQFDFKMYIAFVFKEKKKKSALFEVSEVIPVMT 

SEG 

PRO cccccchhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhheeeeec 

SEQ NNYEENILKGVRDSSYSLESSLELLQKDVVQLHAPRYQSMRRDVIGCTQEMDFILWPRND 

SEG xxxxxxxxxxxx 

PRD cchhhhhhhcccccccccchhhhhhhhhhhhhhccccccccceeecccccceeeecccch 

SEQ IEKIVCLLFSRWKESDEPFRPVQAKFEFHHGDYEKQFLHVLSRKDKTGIVVNNPNQSVFL 

SEG 

PRD hhhhhhhhhhhccccccccccccccccccccchhhhhhhhhhhcccceeeeccccceeee 

SEQ FIDRQHLQTPKNKATIFKLCSICLYLPQEQLTHWAVGTIEDHLRPYMFE 

SEG 

PRD eeecccccccccceeeeeeeeeeeeeccccccccceeeecccccccccc 



Prosite for DKFZphfbr2_2gl8.2 



PS00001 


175->179 


PS00004 


22->26 


PSOO004 


44->48 


PS00005 


6->9 


PS00005 


99->102 


PS00005 


162->165 


PS00005 


189->192 


PS00006 


25->29 


PS00006 


80->84 


PS00006 


162->166 


PS00006 


■ 218->222 


PS00007 


69->77 


PS00008 


70->76 


PS00008 


168->174 



ASN_GL YCOS YLAT I ON 

CAMP_PHOSPHO_SITE 

CAM P_PHOS PHO_S I TE 

PKC_PHOS PHO_S ITE 

PKC_PHOS PHO_S ITE 

PKC_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphfbr2_2gl8 . 2) 
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DKFZphfbr2_2hl 



group: brain derived 

DKFZphfbr2_2hl encodes a novel 180 amino acid protein with weak similarity to C.elegans 
D2007.4 protein 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 



similarity to C.elegans D2007.4 protein 
CpG island in 5 ? region, complete cDNA 
Sequenced by Qiagen 
Locus : unknown 
Insert length: 957 bp 

Poly A stretch at pos. 939, polyadenylation signal at pos. 916 



1 GGGGGTCCCT GACTTTATAT GGCTGCTCCT GGCGAGCGAC TGAGTCGTCC 

51 GTGAGGAAAA AGAGGCGAGG CTTTTCCGAG ATCGTCTCAG CGATGGCGCT 

101 TCGGTCGCGG TTTTGGGGGT TGTTCTCGGT TTGCAGGAAC CCTGGGTGCA 

151 GGTTCGCAGC CCTGTCAACC AGCTCCGAGC CGGCAGCGAA ACCTGAAGTG 

201 GACCCTGTGG AAAATGAAGC TGTCGCCCCA GAATTCACCA ACCGGAACCC 

251 CCGGAACCTG GAGCTTTTGT CTGTAGCCAG GAAAGAGCGG GGCTGGCGGA 

301 CGGTGTTTCC CTCCCGTGAG TTCTGGCACA GGTTGCGAGT TATAAGGACT 

351 CAGCATCATG TAGAAGCACT TGTGGAGCAT CAGAATGGCA AGGTTGTGGT 

401 TTCGGCCTCC ACTCGTGAGT GGGCTATTAA AAAGCACCTT TATAGTACCA 

4 51 GAAATGTGGT GGCTTGTGAG AGTATAGGAC GAGTGCTGGC ACAGAGATGC 

501 TTAGAGGCGG GAATCAACTT CATGGTCTAC CAACCAACCC CGTGGGAGGC 

551 AGCCTCAGAC TCGATGAAAC GACTACAAAG TGCCATGACA GAAGGTGGTG 

601 TGGTTCTACG GGAACCTCAG AGAATCTATG AATAAATGGA AGCATTAATT 

651 GTTTTGAACA TGTAAATATA AATCTGTCAG CCACTACAGC CATCAAAAGA 

701 GAGCATCTGG AAGAACAGCC AGCTTGGAAG TTTTACAGCA ATAATGTTGC 

751 AGTGGAATAT TATTTGTAGT TAAGGTCATC CTCCTCCCCT TTCTGTTTTT 

801 TTAAATCAAG AACTACGTTC TGCCCCTCTC TTGGGCTTCA GAAGCATCTA 

851 AGAAAAGCAG TCATCAATTA TAATTAACTT TCAAAGGGCA AGTCAGAAGT 

901 TGTTTATAAA TTACAAAATA AAGGCATATT ATGAACTCTA AAAAAAAAAA 

951 AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 93 bp to 632 bp; peptide length: 180 
Category: similarity to known protein 
Classification: unset 



1 MALRSRFWGL FSVCRNPGCR FAALSTSSEP AAKPEVDPVE NEAVAPEFTN 
51 RNPRNLELLS VARKERGWRT VFPSREFWHR LRVIRTQHHV EALVEHQNGK 
101 VWSASTREW AIKKHLYSTR NVVACESIGR VLAGRCLEAG INFMVYQPTP 
151 WEAASDSMKR LQSAMTEGGV VLREPQRIYE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2hl, frame 3 
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PIR:S44789 D2007 . 4 protein - Caenorhabditis elegans, N « 1, Score «= 
194, p - 2e-15 

PIR:JC5753 ribosomal protein L18 - Vibrio proteolyticus, N = 1, Score - 
121, P = l.le-07 



>PIR:S44789 D2007.4 protein - Caenorhabditis elegans 
Length » 170 

HSPs: 

Score « 194 (29.1 bits), Expect = 2.0e-15, P - 2.0e-15 
Identities = 51/134 (38%), Positives = 78/134 (58%) 

Query: 48 FTNRNPRNLELLSVARKERGWRTVFP--SREFWHRLRVIRTQHHVEA-LVEHQNGKVVVS 104 

F NRNPRN EL+ G++ +R + +++ ++ + H E LV +Q+G VV+S 

Sbjct: 9 FVNRNPRNNELMGRQAPNTGYQFEKDRAARSYIYKVELVEGKSHREGRLVHYQDG-VVIS 67 

Query: 105 ASTREWAIKKHLYSTRNWACESIGRVLAQRCLEAGINFMVYQPTPWEAASDSMKRLQ — 162 

AST+E +1 LYS + A +IGRVLA RCL++GI+F + T EA S + 
Sbjct: 68 ASTKEPSIASQLYSKTDTSAALNIGRVLALRCLQSGIHFAMPGATK-EAIEKSQHQTHFF 126 

Query: 163 SAMTEGGVVLREPQRI 178 

A+ E G+ L+EP + 
Sbjct: 127 KALEEEGLTLKEPAHV 142 



Pedant information for DKFZphfbr2_2hl, frame 3 



Report for DKFZphfbr2_2hl . 3 



[LENGTH] 180 

[MW] 20576.57 

[pi] 9.63 

[HOMOL] PIR:S44789 D2007.4 protein - Caenorhabditis elegans 2e-13 

[FUNCAT] j mrna translation and ribosome biogenesis [H. influenzae, 

[SUPFAM] Escherichia coli ribosomal protein L18 8e-06 

[KW] Alphabet a 



HI0794] 2e-04 



SEQ 
PRD 



MALRSRFWGLFSVCRNPGCRFAALSTSSEPAAKPEVDPVENEAVAPEFTNRNPRNLELLS 
ccccccceeeeeeeecccccceeeecccccccccccccccceeeecccccccccchhhhh 



SEQ 
PRD 



VARKERGWRTVFPSREFWHRLRVIRTQHHVEALVEHQNGKVVVSASTREWAIKKHLYSTR 
hhhhcccccccchhhhhhhhhhccccchhhhhhhhhcccceeeeechhhhhhhhhhhhcc 



SEQ 
PRD 



NVVACES1GRVLAQRCLEAGINFMVYQPTPWEAASDSMKRLQSAMTEGGVVLREPQRIYE 
ccceeehhhhhhhhhhhhhcceeeeeccccchhhhhhhhhhhhhhhccceeecccccccc 



(No Prosite data available for DKFZphfbr2_2hl . 3) 
(No Pfam data available for DKFZphfbr2_2hl .3) 
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DKFZphfbr2 2hl0 



group: brain derived 

DKFZphfbr2_2hlO encodes a novel 220 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

unknown 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 2176 bp 

Poly A stretch at pos. 2161, polyadenylation signal at pos. 2143 

1 TGGGGAGTAT TCTAATTATA TTTTATATTT AATAAATTAT TTTTCTATTT 
51 CTTTGTTATA TTAAGTTGCA CACTTGTTTC TTTTATCCAG AAAGTTTAGT 

101 ATAATAAAAA TAGTTTTAAG ATTAACTGTG AATGTAAAGG AAAAGTATTA 

151 TTAATTATTT CAGGAAATTG CAAGACCTAA CATGGCTGAA AGAGAAACAG 

201 AAACATCAAA TTCTGAAAGT AAACAAGATA AAGCTGCTTC TTCAAAAGAA 

251 AAAAATGGAT GTAATGCAAA TTCATTTGAA GGCTCATCAA CAACAAAAAG 

301 TGAAGAAAGC ATAACAGTTT CAGATAAGGA AAATGAAACC TGTCTTGCAG 

351 ACCAGGAAAC TGGCTCAAAA AACATCGTCA GTTGTGATTC AAATATTGGT 

401 GCAGATAAAG TGGAAAAGAA AAAACAAATA CAACACGTTT GTCAGGAAAT 

451 GGAGTTGAAG ATGTGCCAGA GTTCAGAAAA CATAATCTTA TCTGATCAGA 

501 TTAAAGATCA CAACTCCAGT GAAGCCAGAT TTTCTTCAAA GAATATTAAG 

551 GATTTGCGAT TAGCATCAGA TAATGTAAGC ATTGATCAGT TTTTGAGAAA 

601 AAGACATGAA CCTGAATCTG TTAGTTCTGA TGTTAGCGAG CAAGGCAGTA 

651 TTCATTTGGA ACCTCTGACT CCATCCGAGG TACTTGAGTA TGAAGCCACA 

701 GAGATTCTTC AGAAAGGTAG TGGTGATCCT TCAGCCAAGA CTGATGAAGT 

751 AGTGTCTGAT CAAACAGATG ACATTCCTGG AGGAAATAAC CCTAGCACAA 

801 CAGAGGCAAC AGTAGACCTG GAAGATGAAA AAGAAAGAAG TTGAAATTAG 

851 TCATTTTAAG TTTCAGTGTA CCAACGATAA GGGCATTTGG AACAGTGCTA 

901 TCAGGTGAGC TCAGTGGTGC TGTTGTAGGT TCAGAAATGG AAATATGTAA 

951 GGGAGGTCAC AC AT AC ACT T TACCTGTATG TTCAACCTAT GTTATCAAAC 
1001 AAACCAATTC ACCAATAATA GCATGATTAG TAGGGATTCC CAAAAAGTTT 
1051 TTAAAAACAC GAACAGGATT TTAATGATAA TTAAATTTGC AGTGGAAAGG 
1101 TCTCATTTAA TGGTTTTCAA GGAAATGGGA TTTGGTTGCT GACATGAATT 
1151 GATGATATTA GTAATATTTA TAAAGCCTTT CAAACTTCCA TCAATCCTAA 
1201 GCTAAAAATC TTTATTACCT GTATATCCTT TTCAGTTAAC TGAGAGGAAG 
1251 GGATTTGGAA ACCATGTACT TTTGGGGAGT AATTGATTAA AAACAATGGC 
1301 TGATTGGCAT TGTTAATGAA GGCTTTATTT GTGAGGATGA TGCTGGTAAA 
1351 TGGAGCATGC TTAGAGTACT AAATTGATCT AATGAGAATT TGGATGAACA 
1401 TAAACTTAAT TTTGGATTTA ATATAACATT CCAGTCAGAC GCATGTAAAC 
1451 AGAATATTTG AATCTTTGTA CCTCCATACA AGTGTTAGCC TGCCAGGCTG 
1501 TAAGCTTACC TTAATTAAAC TTTCAGTGAA AGTGGAATTA TTAAGATATA 
1551 AATTTATATT TGTGCTTTTT GTCAGTGTGT AAGCTGTGTA GAAATTCTTT 
1601 GATGTATTAG TTGTATTAAT GTAAAGTAGA AACCCATTGT TGAAACTCCT 
1651 GTAGCTATTA TGCTTTTAAT ATTGTTTTAA TGTTCTTCCT TAGAAATAGG 
1701 CCCATAAAAA TGGTCTGGAA GCCAAACCAA AGTATGGTAT AATGTAGATA 
1751 TTGTAAAGCA GTAAACTGAA AACATGTCCT GGCATGTATT CAGCCATGTT 
1801 TAAGTGACTT TTCTGTAATT GTAAAATAAA AACTTCAAAT GGGACCTAAA 
1851 ACAGTGATGT AAAAGAACTG GTTTTGGAAA TTTAGCCTAA TTTATCTATA 
1901 AGATGGCTGC TAAATTGATT TTTCAGTTCT TTTTATCATC TAAAATATAA 
1951 TAGATATAGA AATGAATAAT ATGAAGAACA GTAGTTTGCT TTGAAATACT 
2001 AATAAACTTT TATTTAAGAT GCTTCATTTT TACTTCTTAA AACGTGCTTT 
2051 GGATTCTTAA ATTTTGTTTC ACTGAATGTT CAATGTTTTA AATGGCGATT 
2101 AAAATACTCT GCTGTATATA GTAGTTTTTG AGTAAATATT TGCAATAAAA 
2151 ATCTGCCCCC GAAAAAAAAA AAAAAA 



BLAST Results 



Entry G35287 from database EMBL: 
human STS SHGC-37375. 
Score = 2163, P - 2.8e-91, identities = 437/441 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 182 bp to 841 bp; peptide length: 220 
Category: putative protein 



1 MAERETETSN SESKQDKAAS SKEKNGCNAN SFEGSSTTKS EESITVSDKE 
51 NETCLADQET GSKNIVSCDS NIGADKVEKK KQIQHVCQEM EL KMC QS SEN 
101 IILSDQIKDH NSSEARFSSK NIKDLRLASD NVSIDQFLRK RHEPESVSSD 
151 VSEQGSIHLE PLTPSEVLEY EATEILQKGS GDPSAKTDEV VSDQTDDIPG 
201 GNNPSTTEAT VDLEDEKERS 

BLAST P hits 

No BLAST P hits available 

Alert BLASTP hits for DKFZphfbr2_2hl0, frame 2 
No Alert BLASTP hits found 

Pedant information for DKF2phfbr2_2hl0, frame 2 



Report for DKFZphfbr2_2hl0 . 2 



[LENGTH) 220 

[MWJ 24109.02 

[pi] 4.51 

[FUNCAT] 04.99 other transcription activities [S. cerevisiae, YKR092c] 4e-05 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YKR092c] 4e-05 

[PROSITE] MYRISTYL 3 

[PROSITE] CK2_PH0SPHO_SITE 8 

[PROSITE] PKC_PH0SPHO_SITE 5 

[PROSITE] ASN_GLYCOSYLATION 3 

[PFAM] TNFR/NGFR cysteine-rich region 

[KW] Alpha_Beta 



SEQ MAERETETSNSESKQDKAASSKEKNGCNANSFEGSSTTKSEESITVSDKENETCLADQET 

PRD cccccccccccccchhhhhhhhccccccccccccccccceeeeeeeeccccccccccccc 

SEQ GSKNIVSCDSNIGADKVEKKKQIQHVCQEMELKMCQSSENIILSDQIKDHNSSEARFSSK 

PRD cccceeeecccccchhhhhhhhhhhhhhhhhhhhhhccceeeeccccccccccccccccc 

SEQ NIKDLRLASDNVSIDQFLRKRHEPESVSSDVSEQGSIHLEPLTPSEVLEYEATEILQKGS 

PRD cchhhhhhcccchhhhhhhhcccccccccccccccceeecccccccchhhhhhhcccccc 

SEQ GDPSAKTDEVVSDQTDDI PGGNNPSTTEATVDLEDEKERS 

PRD ccccccccccccccccccccccccccceeeehhhhhhccc 



Prosite for DKFZphfbr2_2hlO . 2 



PS00001 


51->55 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


111->115 


ASN~ 


"GLYCOSYLATION 


PDOC00001 


PS00001 


131->135 


asn" 


"GLYCOSYLATION 


PDOC00001 


PSOOO05 


20->23 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PSOO005 


37->40 


PKC~ 


"PHOSPHO~SITE 


PDOC00005 


PS00005 


47->50 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


118->121 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


184->187 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00006 


9->13 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


13->17 


CK2" 


"PH0SPH0~SITE 


PDOC00006 


PS00006 


20->24 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


38->42 


CK2" 


~PHOSPHO~SITE 


PDOC00006 


PS00006 


45->49 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


47->51 


CK2* 


"PHOSPHO SITE 


PDOC00006 


psooooe 


163->167 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


205->209 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00008 


26->32 


MYRISTYL 


PDOC00008 
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PS00008 
PS00008 



34->40 
201->207 



MYRISTYL 
MYRISTYL 



PDOC00008 
PDOC00008 



Pfam for DKFZphfbr2_2hlO . 2 



HMM_NAME TNFR/NGFR cysteine-rich region 

HMM *CpeG. tYtD. WNHvpqClpCtrCePEMGQYMvqPCTwTQNTVC* 

+E+ T +D +N ++C E G+ + +C+++ + 

Query 40 SEESITVSDKEN— ETC— LADQET— GSKNIVSCDSNIGADK 



76 
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group: intracellular transport and trafficking 

DKF2phfbr2_2il7 .3 encodes a novel 201 amino acid putative GTP-binding protein related to 
RablB. 

Rab proteins are members of the Ras superfamily of GTPases. Rab proteins are localised to the 
cytoplasmic side of organelles and vesicles involved in the secretory (biosynthetic) and 
endocytotic pathways in eukaryotic cells. Rab proteins direct the targeting and fusion of 
transport vesicles to their acceptor membranes. RablB is essential for the intracellular 
transport of nascent low density lipoprotein (LDL) receptor. It is discussed as a universal 
mediator of endoplasmatic reticulum to Golgi transport of membrane glycoproteins in mammalian 
cells. 

The new protein can find clinical application in modulating the transport of glycoproteins 
inside cells, especially of the LDL receptor. 



Medline 

96245776: Intracellular transport and maturation of nascent low density 
lipoprotein receptor is blocked by mutation in the Ras-related 
GTP-binding protein, RAB1B 



strong similarity to rabl 

complete cDNA, complete cds, start at 47, EST hits 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1985 bp 

Poly A stretch at pos . 1901, polyadenylation signal at pos. 1859 



1 GGGAGCAGAG TCGACTGGGA GCGACCGAGC GGGCCGCCGC CGCCGCCATG 
51 AACCCCGAAT ATGACTACCT GTTTAAGCTG CTTTTGATTG GCGACTCAGG 
101 CGTGGGCAAG TCATGCCTGC TCCTGCGGTT TGCTGATGAC ACGTACACAG 
151 AGAGCTACAT CAGCACCATC GGGGTGGACT TCAAGATCCG AACCATCGAG 
201 CTGGATGGCA AAACTATCAA ACTTCAGATC TGGGACACAG CGGGCCAGGA 
251 ACGGTTCCGG AC CATC AC TT CCAGCTACTA CCGGGGGGCT CATGGCATCA 
301 TCGTGGTGTA TGACGTCACT GACCAGGAAT CCTACGCCAA CGTGAAGCAG 
351 TGGCTGCAGG AGATTGACCG CTATGCCAGC GAGAACGTCA ATAAGCTCCT 
401 GGTGGGCAAC AAGAGCGACC TCACCACCAA GAAGGTGGTG GACAACACCA 
451 CAGCCAAGGA GTTTGCAGAC TCTCTGGGCA TCCCCTTCTT GGAGACGAGC 
501 GCCAAGAATG CCACCAATGT CGAGCAGGCG TTCATGACCA TGGCTGCTGA 
551 AATCAAAAAG CGGATGGGGC CTGGAGCAGC CTCTGGGGGC GAGCGGCCCA 
601 ATCTCAAGAT CGACAGCACC CCTGTAAAGC CGGCTGGCGG TGGCTGTTGC 
651 TAGGAGGGGC ACATGGAGTG GGACAGGAGG GGGCACCTTC TCCAGATGAT 
701 GTCCCTGGAG GGGGGAGGAG GTACCTCCCT CTCCCTCTCC TGGGGCATTT 
751 GAGTCTGTGG CTTTGGGGTG TCCTGGGCTC CCCATCTCCT TCTGGCCCAT 
801 CTGCCTGCTG CCCTGAGCCC CGGTTCTGTC AGGGTCCCTA AGGGAGGACA 
851 CTCAGGGCCT GTGGCCAGGC AGGGCGGAGG CCTGCTGTGC AGTTGCCTCT 
901 AGGTGACTTT CCAAGATGCC CCCCTACACA CCTTTCTTTG GAACGAGGGC 
951 TCTTCTGTCG GTGTCCCTCC CACCCCCATG TATGCTGCAC TGGGTTCTCT 
1001 CCTTCTTCTT CCTGCTGTCC TGCCCAAGAA CTGAGGGTCT CCCCGGCCTC 
1051 TACTGCCCTG GCTGCAGTCA GTGCCCAGGG CGAGGAATGT GGCCAGGGGA 
1101 TCCAGGACCT GGGATCCAGG GCCCTGGGCT GGACCTCAGG ACAGGCATGG 
1151 AGGCCACAGG GGCCCAGCAG CCCACCCTTT CCTCTCCCCA CTGCCTCCTC 
1201 TCCCTTCCTA CACTCCCAGC TCGAGCCGTC CAGCTGCGGT GGGATCTGAG 
1251 TATATCTAGG GCGGGTGGGC GGGTAGCAGT GCTGGGCCTG TGTCTTGAGC 
1301 CTGGAGGGAG ACTGCTCCTG CCGCCCTCTG CCCTGCCGGA GACAGACCCA 
1351 TGCGCTGCCT GCCCACCGTG CCCCTTTGTC CCCATGTCAG GCGGAGGCGG 
1401 AAGGCCCACC GTGCCAGAGG CTGGGCACCA GCCTTAACCC TCACTCTGCT 
1451 AGCACCTCCT CCCTTTCCCC AAGGTAGCAC ATCTGGCTCA CTCCCCACTC 
1501 CGTCTCTGGA GCCCACCAGG GAAGGCCCTC ATCCCCTGCC GCTACTTCTC 
1551 TGGGGAATGT GGGTTCCATC CAGGATTGGG GGCCTCTCTG CTCACCCACT 
1601 CTGCACCCAG GATCCTAGTC CCCTGCCCTC TGGCACAGCT GCTTCCTGCA 
1651 AGAAAGCAAG TCTTTGGTCT CCCTGAGAAG CCATGTCCCT CGTGCTGTCT 
1701 CTTGCCTGTC CCACCTGTGC CCTGCCCTCC AGCTTGTATT TAAGTCCCTG 
17 51 GGCTGCCCCC TTGGGGTGCC CCCCGCTCCC AGGTTCCCCT CTGGTGTCAT 
1801 GTCAGGCATT TTGCAAGGAA AAGCCACTTG GGGAAAGATG GAAAAGGACA 
1851 AAAAAAATTA ATAAATTTCC ATTGGCCCTC GGGTGAGCTG AGGGTTTTTG 
1901 CAAGGAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1951 AAAAAAAAAA AAAAGAAAAA AAAAAAAAAA AAAAA 
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BLAST Results 



No BLAST result 



Medline entries 



91115900: 

A family of ras-like GTP-binding proteins expressed in electromotor 
neurons . 



Peptide information for frame 3 



ORF from 48 bp to 650 bp; peptide length: 201 
Category: strong similarity to known protein 



1 MNPEYDYLFK LLLIGDSGVG KSCLLLRFAD DTYTESYIST IGVDFKIRTI 
51 ELDGKTIKLQ IWDTAGQERF RTITSSYYRG AHGIIVVYDV TDQESYANVK 
101 QWLQEIDRYA SENVNKLLVG NKSDLTTKKV VDNTTAKEFA DSLGIPFLET 
151 SAKNATNVEQ AFMTMAAEIK KRMGPGAASG GERPNLKIDS TPVKPAGGGC 
201 C 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2il7, frame 3 

SWISSPROT:RBlB_RAT RAS-RELATED PROTEIN RAB-lB. , N = 1, Score - 1023, P 
« 2.7e-103 

PIR:S06147 GTP-binding protein rablB - rat, N ~ 1, Score = 1013," P «• 
3.2e-102 

SWISSPROT:RAB1_DISOM RAS-RELATED PROTEIN ORAB-1 . , N = 1, Score = 967, P 
= 2.4e-97 

PIR:TVHUYP GTP-binding protein Rabl - human, N = 1, Score = 966, P - 
3e-97 

>SWISSPROT:RBlB_RAT RAS-RELATED PROTEIN RAB-lB. 
Length = 201 

HSPs: 

Score = 1023 (153.5 bits), Expect = 2.7e-103, P = 2.7e-103 
Identities = 197/201 (98%), Positives » 199/201 (99%) 

MNPEYDYLFKLLLIGDSGVGKSCLLLRFADDTYTESYISTIGVDFKIRTI ELDGKTIKLQ 
MNPEYDYLFKLLLIGDSGVGKSCLLLRFADDTYTESYISTIGVDFKIRTI ELDGKTIKLQ 



IWDTAGQERFRT+TSSYYRGAHGIIWYDVTDQESYANVKQWLQEIDRYASENVNKLLVG 
IWDTAGQERFRTVTSSYYRGAHGII VVYDVTDQESYANVKQWLQEIDRYASENVNKLLVG 

NKSDLTTKKVVDNTTAKEFADSLGIPFLETSAKNATNVEQAFMTMAAEIKKRMGPGAASG 
NKSDLTTKKWDNTTAKEFADSLG+PFLETSAKNATNVEQAFMTMAAEIKKRMGPGAASG 
NKSDLTTKKVVDNTTAKEFADSLGVPFLETSAKNATNVEQAFMTMAAEIKKRMGPGAASG 

GERPNLKI DSTPVKPAGGGCC 201 
GERPNLKI DSTPVK A GGCC 
GERPNLKI DSTPVKSASGGCC 201 

Pedant information for DKFZphfbr2_2il7, frame 3 

Report for DKFZphfbr2_2il7 . 3 

[LENGTH] 201 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 
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[MWJ 22171.25 

tpl] 5.56 

[HOMOL] SWISSPR0T:RB1B_RAT RAS- RELATED PROTEIN RA8-1B . le-112 

[FUNCAT] 08.07 vesicular transport (golgi network, etc.) (S. cerevisiae, YFX038c] 

2e-77 

[ FUNCAT ] 30.08 organization of golgi (S. cerevisiae, YFL038c) 2e-77 

[FUNCAT] 30.09 organization of intracellular transport vesicles [S, cerevisiae, 

YFL005W] 4e-57 

[FUNCAT] 30.02 organization of plasma membrane (S. cerevisiae, YFLOOSw] 4e-57 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YFLOOSw] 
4e-57 

[FUNCAT] 08.19 cellular import [S. cerevisiae, YER031c] 8e-46 

[FUNCAT] 08.13 vacuolar transport [S. cerevisiae, YER031c] 8e-46 

* [FUNCAT] 09.09 biogenesis of intracellular transport vesicles [S. cerevisiae, 

YGL210w] le-44 

[FUNCAT] 06.04 protein targeting, sorting and translocation [S. cerevisiae, YOR089C] 
le-30 

[FUNCAT] 03.10 sporulation and germination [S. cerevisiae, YNL09Bc] 3e-25 

[FUNCAT] 11.01 stress response [S. cerevisiae, YNL098C] 3e-25 

[FUNCAT] 03.99 other cell growth, cell division and dna synthesis activities [S. 
cerevisiae, YNL098c] 3e-25 

[FUNCAT] 01.03.13 regulation of nucleotide metabolism [S. cerevisiae, YNL098c] 

3e-25 

[FUNCAT] 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YNL098c] 

3e-25 

[FUNCAT] 10.04.07 g-proteins [S. cerevisiae, YNL098c] 3e-25 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YNL098c] 3e-25 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YORlOlw] 9e-24 

[FUNCAT] 11.10 cell death [S. cerevisiae, YORlOlw] 9e-24 

[FUNCAT] 04.07 rna transport [S. cerevisiae,- YORl85c] 4e-23 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YOR185c] 4e-23 

[FUNCAT] 08.01 nuclear transport [S. cerevisiae, YOR185c] 4e-23 

[FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YPR165w] 7e-17 

[FUNCAT] 10.02.07 g-proteins [S. cerevisiae, YPR165w] 7e-17 

[FUNCAT] 10.99 other signal-transduction activities [S. cerevisiae, YCR027c] le-16 

[ FUNCAT ] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YLR229c] le-11 

[FUNCAT] 10.05.07 g-proteins [S. cerevisiae, YLR229c] le-11 

[FUNCAT] 06.10 assembly of protein complexes (S. cerevisiae, YDL192w] 4e-10 

[FUNCAT] 03.01 cell growth [S. cerevisiae, YNLl80c] 9e-09 

[FUNCAT] 06.07 protein modification (glycolsylation, acylation, myristylation, 
palmitylation, farnesylation and processing) [S. cerevisiae, YPLOSlw) 3e-08 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YAL048c] 5e-05 

[BLOCKS] BL01019A ADP-ribosylation factors family proteins 

[BLOCKS] BL01115A GTP-binding nuclear protein ran proteins 

[SCOP] dlplk 3.25.1.3.1 cH-p21 Ras protein [human (Homo sapiens) 2e-41 

[SCOP] dlguaa_ 3.25.1.3.10 RaplA [Human (Homo sapiens) 5e-60 

[SCOP] dlrrga_ 3.25.1.3.5 ADP-ribosylation factor 1 (ARF1) [rat (Rattu 2e-30 

[SCOP] dlhura_ 3.25.1.3.4 ADP-ribosylation factor 1 (ARF1) [human (Horn 2e-33 

[PIRKW] nucleus le-21 

[PIRKW] membrane trafficking le-110 

[PIRKW] oncogene le-25 

[PIRKW] endoplasmic reticulum le-105 

[PIRKW] phosphoprotein le-105 

[PIRKW] glycoprotein 3e-25 

[PIRKW] prenylated cysteine le-110 

[PIRKW] signal transduction 4e-23 

[PIRKW] transforming protein le-105 

[PIRKW] purine nucleotide binding 2e-24 

[PIRKW] alternative splicing 5e-26 

[PIRKW] P-loop le-110 

[PIRKW] lipoprotein le-110 

[PIRKW] proto-oncogene 3e-27 

[PIRKW] methylated carboxyl end 3e-27 

[PIRKW] hydrolase 7e-25 

[PIRKW] membrane protein le-105 

[PIRKW] GTP binding le-110 

[PIRKW] thiolester bond 5e-76 

[PIRKW] Golgi apparatus le-105 

[SUPFAM] ras transforming protein le-110 

[PROSITE] ATP_GTP_A 1 

(PROSITE) MYRISTYL 2 

[PROSITE] CK2_PHOSPHO_SITE 5 

(PROSITE) SIGMA54_INTERACT_1 1 

(PROSITE) TYR_PHOSPHO_SITE 1 

[PROSITE) GLYCOSAMINOGLYCAN 1 

[PROSITE) PKC_PHOSPHO_SITE 4 

[PROSITE] ASN_GLYCOSYLATION 3 

[PFAM] Ras family (contains ATP /GTP binding P-loop) 

[KW] Alpha_Beta 

[KW] 3D 
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SEQ MNPEYDYLFKLLLIGDSGVGKSCLLLRFADDTYTESVISTIGVDFKIRTIELDGKTIKLQ 

22 lp- EEEEEEETTTTCHHHHHHHHHHCCCCCCCCCTTTEEEE-EEEEETTEEEEEE 

SEQ IWDTAGQERFRTITSSYYRGAHGIIVVYDVTDQESYANVKQWLQEIDRYASENVNKLLVG 

22 lp- EEECTTTTTTCGGGHHHHHHCCEEEEEEETTBHHHHHHHHHHHHHHHHHHTTTTCEEEEE 

SEQ NKSDLTTKKWDNTTAKEFADSLGIPFLETSAKNATNVEQAFMTMAAEIKKRMGPGAASG 

22 lp- ETTTTCCC-CCCHHHHHHHHHHCCCCEEEETTTTTTTHHHHHHHHHHHHHH 

SEQ GERPNLKI DSTPVKPAGGGCC 

221p- 



Prosite for DKFZphfbr2_2il7 .3 



PS00001 
PS00001 
PS00001 
PS00002 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 
PS00008 
PS00017 
PS00675 



121->125 
133->137 
154->158 

17- >21 
56->59 

126->129 
135->138 
151->154 
32->36 
91->95 
135->139 
156->160 
179->183 
27->34 

18- >24 
176->182 

15->23 
ll->25 



ASN_GLYCOSYLATION 
AS N_GL YCOS Y L AT I ON 
ASN_GLYCOSYLATION 

glycosaminoglycan 

pkc_phospho_site 

pkc_phospho_site 

pkc_phospho_site 

pkc_phospho_site 

ck2 phospho site 

ck2~phospho~site 

ck2 phospho_site 

ck23phospho_site 

ck2_phospho_site 

tyr_phospho_site 

myristyl 

myristyl 

ATP GTP_A 

SIGMA54 INTERACT 1 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00002 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00017 
PDOC00579 



Pfara for DKFZphfbr2_2il7 . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Ras family (contains ATP/GTP binding P-loop) 

*KLVLIGDSGVGKSCLLIRFTQNeFnEeYIPTIGvDFYtKTIEIDGKtIK 
KL+LIGDSGVGKSCLL+RF +++++E+YI+TIGVDF+++TIE+DGKTIK 
10 KLLLIGDSGVGKSCLLLRFADDTYTESYISTIGVDFKIRTIELDGKTIK 58 

LQIWDTAGQERYRsMRPMYYRGAMGFMLVYDITNRqSFENIrNWweEIrR 
LQIWDTAGQER+R+++++YYRGA+G+++VYD+T+++S+ N+++W++EI+R 
59 LQIWDTAGQERFRTITSSYYRGAHGII WYDVTDQESYANVKQWLQEIDR 108 

HCDrDENVPIMLVGNKCDLEDQRQVStEEGQeFAREWGAIPFMETSAKTN 
+++ ENV ++LVGNK+DL +++V+ +++EFA+++G IPF+ETSAK++ 
109 YAS--ENVNKLLVGNKSDLTTKKVVDNTTAKEFADSLG-IPFLETSAKNA 155 



iNVEEAFMEIvRellqrMqe.q.NqteNinidQpsrnrk. . .rCCCIM* 
+NVE+AFM+++ EI++RM+ +++E +N++ +S++ K +CC 
156 TNVEQAFMTMAAEIKKRMGPGAASGGERPNLKI DSTPVKPAGGGCC- - 
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DKFZphfbr2_2kl9 



group: brain derived 

DKFZphfbr2_2kl9 encodes a novel 303 amino acid protein with similarity to human KIAA0378 
product . 

The protein contains a leucine zipper, which can mediate protein-protein-interaction. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to KIAA0378 

encoded by the genomic clones HS147M19/HS608E8 

Sequenced by Qiagen 

Locus : unknown 

insert length: 1931 bp 

Poly A stretch at pos. 1866, no polyadenylation signal found 



1 GGGGGGGGCG CGCGGTGACA 

51 GCGACAGAGG CAGCAGCAGC 

101 GCGGCAATGC TGGAGACCCT 

151 TTTCACCTCC GGGCTGAAGA 

201 TGAAAAGCAA ACCCAGGACT 

251 TTAGAATTAC TTAGCAGGTA 

301 AGCCAAAGAC TGTGCAAGTG 

351 TGCTTTCTGC GCACTGGGAG 

401 GAGCAGCTCC AGCAGCTCCC 

451 AGCAAATCTG ACTCATTTAG 

501 TGCTGCATCT GGAAGACTTA 

551 CATATGCAGT CCCAGCAACT 

601 ACTTGAAACC TTCAAAGCTG 

651 TGGAAATGGA GCACACCCAG 

701 TTTGAGGAAG CCTTCCAGCA 

751 CCTGCAGATT GCAGAGCGGC 

801 AAGTGAACGT GGACATGCTG 

851 CAGGAGGCCC TGGACGTCTT 

901 GCTGTCCCCC GCCTTAGGTA 

951 GGCAGTATCG ATGCCACTCC 

1001 CCAGTCACTT ACGCATAAAC 

1051 TAACCCCACG GTTCCACACG 

1101 TCCGTCATGA ATTCTTCTCA 

1151 TGGTGAGTTG AGAGCTTTCT 

1201 ATCCATTTGA GTCTGCTCCT 

1251 TGCGGACTCT CCTGCGGGGC 

1301 AGGCATTTAG GGGCGTGCCT 

1351 CTCTTGGCCT GTGTTGTAAA 

1401 TCACAAAATT TTGTTTCACA 

1451 GACCTGTACC TGGGCTTGGT 

1501 CTGGGTCAAG GCAAAGCTCA 

1551 GAAGGTTCTA CCATTACCAC 

1601 CTTCTCCTGG CAATCTGTCA 

1651 CTTGGGTGCA TTTGCCCTCA 

1701 CGTGAACCCT CACCCAGAGT 

1751 CATATAGAAT TTTGATTCCT 

1801 AGAACTGCTG AAGGTCAGTA 

1851 AATTACTGCA AAGGGTAAAA 

1901 CGACCTCGAT GATGATGATG 



GCGCGGGGTT GGCGGCGTGG GACCCAGGGG 
CCGAGGCCTG AGGAGAGGAG ACCGGCGGCG 
TCGCGAGCGG CTGCTGAGCG TGCAGCAGGA 
CTTTAAGTGA CAAGTCAAGA GAAGCAAAAG 
GTTCCATTTT TGCCAAAGTA CTCTGCTGGA 
TGAGGATACA TGGGCTGCAC TTCACAGAAG 
CTGGAGAGCT GGTGGATAGC GAGGTGGTCA 
AAGAAAAAGA CAAGCCTCGT GGAGCTGCAA 
AGCTTTAATC GCAGACTTAG AATCCATGAC 
AGGCGAGTTT TGAGGAGGTA GAGAACAACC 
TGTGGGCAGT GTGAATTAGA AAGATGCAAA 
GGAGAATTAC AAGAAAAATA AGAGGAAGGA 
AACTAGATGC AGAGCACGCC CAGAAGGTCC 
CAAATGAAGC TGAAGGAGCG GCAGAAGTTT 
GGACATGGAG CAGTACCTGT CCACTGGCTA 
GAGAGCCCAT AGGCAGCATG TCATCCATGG 
GAGCAGATGG TCCTGATGGA CATATCGGAC 
CCTGAACTCT GGAGGAGAAG AGAACACTGT 
GGGTTGACAA ACTTGCATTA GCTGAACCAG 
CCTCCAAAGG TGAGACGTGA GAACCATCTG 
CCCCAAGCTC ACAGCCAGCT CCTGGCTCCC 
GCTGTGTGGC AGCTGCAACA GTGGTGTGGT 
AAGATTTGAC ATGCTCCACT CCGGTAACTT 
TGTTTGTTTT CCCTCCTTTA CCATCCAGAA 
TGTGGTTAAG GACTGGCGTT TGCAGGGAGG 
TCACGGGAAA CTCTTCCCTC TTCGTGCGAC 
GCCATGGGCA AAGCCATGGT GTGTGTTCAG 
CTTAGTTGCA CTTCAGTTCC TTTCATCCCT 
TTCATGCAGC AAATATGGGC TGAGGTGCCA 
GCGTTTCAAA TTTCAGACCA GTTCTTTGGG 
GTCGTCCCAG CAGCACCTCA GCCATCTGTA 
GGTTTCAGCT TCCTCTAAAC TTCTCACCCG 
GAACGGTGTC ATCCTGGGGA AGAGAAGGAG 
TCCTGAGAAG GCCAGAATAC TGGAGACCAG 
CAGGGGAAGA TTTAGAAACA GTGACACCTG 
TGAAGAGCCT ATTTAGTTCC ATAAAATTGG 
ATTCCGACTT TCTCAGCAGT GGTGTCTCTG 
AAAAAAAAAA AAAAAACTTA TCGATACCGT 
ATGATGTCGA C 



BLAST Results 



Entry HS147M19 from database EMBL: 

Homo sapiens DNA sequence from PAC 147M19 on chromosome 6p22. 1-22.3. 

Contains an unknown gene, ESTs and GSSs . 

Score - 5540, P - 4.1e-275, identities « 1114/1120 

3 exons 592-1884 

Entry HS608E8 from database EMBL: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 608E8 
Score - 797, p = 1.2e-78, identities = 161/163 
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6 exons 1-592 



Medline entries 



90294724: 

The involucrin gene of the gibbon: The middle region shared by the 
hominoids 



Peptide information for frame 2 



ORF from 107 bp to 1015 bp; peptide length: 303 
Category: similarity to known protein 
Classification: unset 

Prosite motifs: LEUCINE_ZIPPER (97-119) 



1 MLETLRERLL SVQQDFTSGL KTLSDKSREA KVKSKPRTVP FLPKYSAGLE 
51 LLSRYEDTWA ALHRRAKDCA SAGELVDSEV VMLSAHWEKK KTSLVELQEQ 
101 LQQLPALIAD LESMTANLTH LEASFEEVEN NLLHLEDLCG QCELERCKHM 
151 QSQQLENYKK NKRKELETFK AELDAEHAQK VLEMEHTQQM KLKERQKFFE 
201 EAFQQDMEQY LSTGYLQIAE RREPIGSMSS MEVNVDMLEQ MVLMDISDQE 
251 ALDVFLNSGG EENTVLSPAL GRVDKLALAE PGQYRCHSPP KVRRENHLPV 
301 TYA 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2Jcl9, frame 2 

TREMBL : HSAB237 6_1 gene: "KIAA0378"; Human mRNA for KIAA0378 gene, 
partial cds., N = 1, Score = 137, p = 4.8e-06 

PIR: 137037 involucrin - common gibbon, N = 1, Score = 124, P » 7.4e-05 

PIR:A57013 early endosome antigen 1 - human, N = 1, Score = 128, P - 
9.5e-05 



>TREMBL:HSAB2376_1 gene: "KIAA0378"; Human mRNA for KIAA0378 gene, partial 
cds . 

Length = 808 

HSPs: 



Score " 137 (20.6 bits), Expect = 4.8e-06, P » 4.8e-06 
Identities = 59/222 (26%), Positives = 103/222 (46%) 



Query: 


2 


LETLRERLLSVQQDFTSGLKTL SDKSREAKVKS-KPRTVPFLPKYSAGLELLSRYED 


57 






L TL E L S ++ LK D+ R +++S + K +A L+ E 




Sbjct: 


434 


LATLEEAL-SEKERIIERLKEQRERDDRERLEEIESFRKENKDLKEKVNALQAELTEKES 


492 


Query: 


58 


TWAALHRRAKDCASAGELVDSEVVMLSAHWEKKKTSLVELQEQLQQLPALIADLESMTAN 


117 






+ L A ASAG DS++ L E+KK +L+ QL++ ID M 




Sbjct: 


4 93 


SLIDLKEHASSLASAGLKRDSKLKSLEIAIEQKKEECSKLEAQLKKAHN-IEDDSRMNPE 


551 


Query: 


118 


LTH LEASFEEVEN NLLHLEDLCG — QCELERCKHMQSQQLENYKKNKRK ELETFKAE 


172 




++++ + D CG Q E++R + +++EN K +K K ELE+ 




Sbjct: 


552 


FAD QIKQLDKEASYYRDECGKAQAEVDRLLEIL-KEVENEKNDKDKKIAELESLTLR 


607 


Query: 


173 


LDAEHAQKVLEMEHTQQMKLKERQKFFEEAFQQDMEQYLSTGYLQIAE 220 






+ +KV ++H QQ++ K+ + EE +++ ++ +LQI E 




Sbjct: 


608 


HMKDQNKKVANLKHNQQLEKKKNAQLLEEVRRREDSMADNSQHLQIEE 655 




Score 


- 100 


(15.0 bits), Expect = 6.2e-02, P = 6.0e-02 




Identities = 44/156 (28%), Positives = 76/156 (48%) 




Query: 


57 


DTWAALHRRAKDCASAGELVDSEVVMLSAHWEKKKTSLVELQEQLQQLPAL-IADLESMT 


115 




D A+ +R +C A VD + +L E +K + +L+ L + D 




Sbjct: 


560 


DKEASYYR — DECGKAQAEVDRLLEILK-EVENEKNDKDKKIAELESLTLRHMKDQNKKV 


616 


Query: 


116 


ANLTHLEASFEEVENNLLHLEDLCGQCE— LERCKHMQSQQLENYKKNKRKELETFKAEL 


173 
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ANL H + E+ +N L LE++ + + + +H+Q ++L N + R+EL+ KA L 
Sbjct: 617 ANLKHNQ-QLEKKKNAQL-LEEVRRREDSMADNSQHLQIEELMNALEKTRQELDATKARL 674 

Query: 174 DAEHAQKVLEME-HTQQMKLKERQKFFEEAFQQDMEQYLS 212 

A Q + E E H +++ ER+K EE + E L+ 
SbjCt: 675 -ASTQQSLAEKEAHLANLRI-ERRKQLEEILEMKQEALLA 712 

Pedant information for DKFZphfbr2_2kl9, frame 2 

Report for DKFZphfbr2_2kl9 .2 

[ LENGTH] 303 

(MWJ 34814.78 

Ipl] 5.23 

[PROSITEJ LEUCINE ZIPPER 1 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 3 . 63 % 

[KW] COILED_COIL 14.52 % 

SEQ MLETLRERLLSVQQDFTSGLKTLSDKSREAKVKSKPRTVPFLPKYSAGLELLSRYEDTWA 

SEG 

PRD ccchhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccchhhhhhhhhhhhchhh 
COILS 



SEQ ALHRRAKDCASAGELVDSEWMLSAHWEKKKTSLVELQEQLQQLPALIADLESMTANLTH 

SEG xx xxxxx xxx x 

PRD hhhhhhhhchhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ LEASFEEVENNLLHLEDLCGQCELERCKHMQSQQLENYKKNKRKELETFKAELDAEHAQK 

SEG 

PRD hhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCC 

SEQ VLEMEHTQQMKLKERQKFFEEAFQQDMEQYLSTGYLQIAERREPIGSMSSMEVNVDMLEQ 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhcccccccccchhhhhhhhh 



COILS 



SEQ MVLMDISDQEALDVFLNSGGEENTVLSPALGRVDKLALAEPGQYRCHSPPKVRRENHLPV 

SEG 

PRD hhhhhhchhhhhhhhhccccccceeeccccccccceeeccccccccccccceeecccccc 

COILS 



SEQ TYA 
SEG 

PRD ccc 
COILS 



Prosite for DKFZphfbr2_2kl9 . 2 
PS00029 97->119 LEUCINE_ZIPPER PDOC00029 

(No Pfam data available for DKFZphfbr2_2kl9 . 2 ) 
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DKFZphfbr2_2kl4 



group: cell cycle 

DKFZphfbr2_2kl4 encodes a novel 335 amino acid protein with strong similarity to rattus rattus 
IAG2 "implantation-associated protein" and the human N33 tumour-suppressor gene. 

Tumour-suppressor genes are known to be involved in the control of cell growth and division, 
interacting with proteins which control the cell cycle. The N33 gene is significantly 
methylated in tumour cells, a mechanism by which tumor-suppressor genes are inactivated in 
cancer. In addition, the novel protein contains a RGD cell attachment site. Therefore the 
novel protein is a new putative tumour-suppressor gene. 

The new protein can find application in modulating/blocking the cell cycle and in the therapy 
of tumours. 



strong similarity to human N33 tumor suppressor gene 
complete cDNA, complete cds, EST hits, 

potential start at Bp 30 matches kozak consensus ANCatgG 
potential transmembran protein (4 TM) 

similarity to yeast OST3p (oligosaccharyltransferase gamma chain) 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 2241 bp 

Poly A stretch at pos . 2221, no polyadenylation signal found 



1 TGGGACTTAT AGAAGGGAGA GGAGCGAACA TGGCAGCGCG TTGGCGGTTT 
51 TGGTGTGTCT CTGTGACCAT GGTGGTGGCG CTGCTCATCG TTTGCGACGT 
101 TCCCTCAGCC TCTGCCCAAA GAAAGAAGGA GATGGTGTTA TCAGAAAAGG 
151 TTAGTCAGCT GATGGAATGG ACTAACAAAA GACCTGTAAT AAGAATGAAT 
201 GGAGACAAGT TCCGTCGCCT TGTGAAAGCC CCACCGAGAA ATTACTCCGT 
251 TATCGTCATG TTCACTGCTC TCCAACTGCA TAGACAGTGT GTCGTTTGCA 
301 AGCAAGCTGA TGAAGAATTC CAGATCCTGG CAAACTCCTG GCGATACTCC 
351 AGTGCATTCA CCAACAGGAT ATTTTTTGCC ATGGTGGATT TTGATGAAGG 
401 CTCTGATGTA TTTCAGATGC TAAACATGAA TTCAGCTCCA ACTTTCATCA 
451 ACTTTCCTGC AAAAGGGAAA CCCAAACGGG GTGATACATA TGAGTTACAG 
501 GTGCGGGGTT TTTCAGCTGA GCAGATTGCC CGGTGGATCG CCGACAGAAC 
551 TGATGTCAAT ATTAGAGTGA TTAGACCCCC AAATTATGCT GGTCCCCTTA 
601 TGTTGGGATT GCTTTTGGCT GTTATTGGTG GACTTGTGTA TCTTCGAAGA 
651 AGTAATATGG AATTTCTCTT TAATAAAACT GGATGGGCTT TTGCAGCTTT 
701 GTGTTTTGTG CTTGCTATGA CATCTGGTCA AATGTGGAAC CATATAAGAG 
751 GACCACCATA TGCCCATAAG AATCCCCACA CGGGACATGT GAATTATATC 
801 CATGGAAGCA GTCAAGCCCA GTTTGTAGCT GAAACACACA TTGTTCTTCT 
851 GTTTAATGGT GGAGTTACCT TAGGAATGGT GCTTTTGTGT GAAGCTGCTA 
901 CCTCTGACAT GGATATTGGA AAGCGAAAGA TAATGTGTGT GGCTGGTATT 
951 GGACTTGTTG TATTATTCTT CAGTTGGATG CTCTCTATTT TTAGATCTAA 
1001 ATATCATGGC TACCCATACA GCTTTCTGAT GAGTTAAAAA GGTCCCAGAG 
1051 ATATATAGAC ACTGGAGTAC TGGAAATTGA AAAACGAAAA TCGTGTGTGT 
1101 TTGAAAAGAA GAATGCAACT TGTATATTCT GTATTACCTC TTTTTTTCAA 
1151 GTGATTTAAA TAGTTAATCA TTTAACCAAA GAAGATGTGT AGTGCCTTAA 
1201 CAAGCAATCC TCTGTCAAAA TCTGAGGTAT TTGAAAATAA TTATCCTCTT 
1251 AACCTTCTCT TCCCAGTGAA CTTTATGGAA CATTTAATTT AGTACAATTA 
1301 AGTATATTAT AAAAATTGTA AAACTACTAC TTTGTTTTAG TTAGAACAAA 
1351 GCTCAAAACT ACTTTAGTTA ACTTGGTCAT CTGATCTTAT ATTGCCTTAT 
1401 CCAAAGATGG GGAAAGTAAG TCCTGACCAG GTGTTCCCAC ATATGCCTGT 
1451 TACAGATAAC TACATTAGGA ATTCATTCTT AGCTTCTTCA TCTTTGTGTG 
1501 GATGTGTATA CTTTACGCAT CTTTCCTTTT GAGTAGAGAA ATTATGTGTG 
1551 TCATGTGGTC TTCTGAAAAT GGAACACCAT TCTTCAGAGC ACACGTCTAG 
1601 CCCTCAGCAA GACAGTTGTT TCTCCTCCTC CTTGCATATT TCCTACTGCG 
1651 CTCCAGCCTG AGTGATAGAG TGAGACTCTG TCTCAAAAAA AAAGTATCTC 
1701 TAAATACAGG ATTATAATTT CTGCTTGAGT ATGGTGTTAA CTACCTTGTA 
1751 TTTAGAAAGA TTTCAGATTC ATTCCATCTC CTTAGTTTTC TTTTAAGGTG 
1801 ACCCATCTGT GATAAAAATA TAGCTTAGTG CTAAAATCAG TGTAACTTAT 
1851 ACATGGCCTA AAATGTTTCT ACAAATTAGA GTTTGTCACT TATTCCATTT 
1901 GTACCTAAGA GAAAAATAGG CTCAGTTAGA AAAGGACTCC CTGGCCAGGC 
1951 GCAGTGACTT ACGCCTGTAA TCTCAGCACT TTGGGAGGCC AAGGCAGGCA 
2001 GATCACGAGG TCAGGAGTTC GAGACCATCC TGGCCAACAT GGTGAAACCC 
2051 CGTCTCTACT AAAAATATAA AAATTAGCTG GGTGTGGTGG CAGGAGCCTG 
2101 TAATCCCAGC TGCACAGGAG GCTGAGGCAC GAGAATCACT TGAACTCAGG 
2151 AGATGGAGGT TTCAGTGAGC CGAGATCACG CCACTGCACT CCAGCCTGGC 
2201 AACAGAGCGA GACTCCATCT CAAAAAAAAA AAAAAAAAAA A 
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BLAST Results 



No BLAST result 



Medline entries 



96299740: 

Structure and methylation-associated silencing of a gene 
within a homozygous ly deleted region of human chromosome 
band 8p22. 

97243398: 

Tumour-suppressor genes in prostatic oncogenesis: a 
positional approach. 

98334474: 

Concordant methylation of the ER and N33 genes in 
glioblastoma multiforme. 



Peptide information for frame 3 



ORF from 30 bp to 1034 bp; peptide length: 335 
Category: strong similarity to known protein 



1 MAARWRFWCV SVTMVVALLI VCDVPSASAQ RKKEMVLSEK VSQLMEWTNK 

51 RPVIRMNGDK FRRLVKAPPR NYSVIVMFTA LQLHRQCWC KQADEEFQIL 

101 ANSWRYSSAF TNRIFFAMVD FDEGSDVFQM LNMNSAPTFI NFPAKGKPKR 

151 GDTYELQVRG FSAEQIARWI ADRTDVNIRV IRPPNYAGPL MLGLLLAVIG 

201 GLVYLRRSNM EFLFNKTGWA FAALCFVLAM TSGQMWNHIR GPPYAHKNPH 

251 TGHVNYIHGS SQAQFVAETH IVLLFNGGVT LGMVLLCEAA TSDMDIGKRK 

301 IMCVAGIGLV VLFFSWMLSI FRSKYHGYPY SFLMS 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2kl4, frame 3 

TREMBL: RNAF8554_1 gene: "IAG2"; product: "implantation-associated 
protein"; Rattus norvegicus implantation-associated protein (IAG2) 
mRNA, partial cds., N « 1, Score » 1560, P = 3.4e-160 

PIR:G02297 gene N33 protein - human, N = 1, Score » 1256, P = 5.6e-128 

TREMBL : HSN33S1 1_1 gene: "N33"; product: "N33 protein form 2"; Human 
N33 protein form 2 (N33) gene, exon 11 and complete cds., N = 1, Score 
= 1252, P = 1.5e-127 



>TREMBL: RNAF8554_1 gene: "IAG2"; product: "implantation-associated protein"; 

Rattus norvegicus implantation-associated protein (IAG2) mRNA, partial cds. 
Length = 308 

HSPs: 

Score - 1560 (234.1 bits), Expect - 3.4e-160, P - 3.4e-160 
Identities = 295/307 (96%), Positives = 299/307 (97%) 

Query: 29 AQRKKEMVLSEKVSQLMEWTNKRPVIRMNGDKFRRLVKAPPRNYSVIVMFTALQLHRQCV 88 

AQRKKE VL EKV QLMEWTN+RPVI RMNGDKFR LVKAPPRNYSVIVMFTALQLHRQCV 
Sbjct: 2 AQRKKEKVLVEKVIQLMEWTNQRPVIRMNGDKFRPLVKAPPRNYSVIVMFTALQLHRQCV 61 

Query: 89 VCKQADEEFQILANSWRYSSAFTNRIFFAMVDFDEGSDVFQMLNMNSAPTFINFPAKGKP 14 8 

VCKQADEEFQILAN WRYSSAFTNRIFFAMVDFDEGSDVFQMLNMNSAPTFINFP KGKP 
Sbjct: 62 VCKQADEEFQILANFWRYSSAFTNRIFFAMVDFDEGSDVFQMLNMNSAPTFINFPPKGKP 121 

Query: 14 9 KRGDTYELQVRGFSAEQIARWIADRTDVNIRVIRPPNYAGPLMLGLLLAVIGGLVYLRRS 208 

KR DTYELQVRGFSAEQIARWIADRTDVNIRVIRPPNYAGPLMLGLLLAVIGGLVYLRRS 
Sbjct: 122 KRADTYELQVRGFSAEQIARWIADRTDVNIRVIRPPNYAGPLMLGLLLAVIGGLVYLRRS 181 

Query: 209 NME FL FN KTGW A FAALC FVL AMT SGQMWNH I RG P P Y AH KN PHTG H VN Y I HG S S QAQ FV AE 268 
NMEFLFNKTGWAFAALCFVLAMTSGQMWNHIRGPPYAHKNPHTGHVNYIHGSSQAQFVAE 
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Sbjct: 182 NMEFLFNKTGWAFAALCFVLAMTSGQMWNHIRGPPYAHKNPHTGHVNYIHGSSQAQFVAE 241 

Query: 269 THIVLLFNGGVTLGMVLLCEAATSDMDIGKRKIMCVAGIGLVVLFFSWMLSIFRSKYHGY 32B 

THIVLLFNGGVTLGMVLLCEAA SDMDIGKR++MC+AGIGLVVLFFSWMLSIFRSKYHGY 
Sbjct: 242 THIVLLFNGGVTLGMVLLCEAAASDMDIGKRRMMCIAGIGLVVLFFSWMLSIFRSKYHGY 301 

Query: 329 PYSFLMS 335 

PYSFLMS 
Sbjct: 302 PYSFLMS 308 



Pedant information for DKFZphfbr2_2kl4 , frame 3 



Report for DKFZphfbr2_2kl4 . 3 



[ LENGTH ] 335 

[MWJ 38036.83 

[pi] 9.68 

[HOMOL] TREMBL: RNAF8554_1 gene: "IAG2"; product: "implantation-associated protein"; 

Rattus norvegicus implantation-associated protein (IAG2) mRNA, partial cds . le-161 

[ FUN CAT J 30.07 organization of endoplasmatic reticulum [S. cerevisiae, YOR085w] 

4e-14 

t FUN CAT ] 06.07 protein modification (glycolsylation, acylation, myristylation, 

palmitylation, farnesylation and processing) [S. cerevisiae, YOR085wJ 4e-14 

[FUNCAT] 01.05.01 carbohydrate utilization [S. cerevisiae, YOR085wJ 4e-14 

[EC] 2.4.1.119 Dolichyl-diphosphooligosaccharide — protein glycosyltransf erase le-12 



[PIRKW] 


glycosyltransferase le 


-12 


[PIRKW] 


transmembrane protein 


6e-69 


[PIRKW] 


hexosyltransferase le- 


12 


[PROSITE] 


RGD 1 




[PROSITE] 


MYRISTYL 4 




[PROSITE] 


AMI DAT I ON 1 




[PROSITE] 


CK2 PHOSPHO SITE 


2 


[PROSITE] 


PKC PHOSPHO SITE 


4 


[PROSITE] 


ASN GLYCOSYLATION 


2 


[KW] 


SIGNAL PEPTIDE 30 




[KW] 


TRANSMEMBRANE 4 




[KW] 


LOW_COMPLEXITY 5.97 % 



SEQ MAARWRFWCVSVTMVVALLIVCDVPSAS AQRKKEMVLSEKVSQLMEWTNKRPVI RMNGDK 

SEG 

PRD cccceeeeeeehhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhccceeeeecccc 

MEM 



SEQ FRRLVKAPPRNYSVIVMFTALQLHRQCWCKQADEEFQILANSWRYSSAFTNRIFFAMVD 

SEG 

PRD ceeeeeccccccceeeehhhhhhccceeeehhhhhhhhhhhhhcccccccccceeeeeec 

MEM 



SEQ FDEGSDVFQMLNMNSAPTFINFPAKGKPKRGDTYELQVRGFSAEQIARWIADRTDVNIRV 

SEG 

PRD cccccceeeecccccccceeeccccccccccceeeeeeeccchhhhhhhhhhhhheeeee 

MEM M 



SEQ IRPPNYAGPIiMLGLLLAVIGGLVYLRRSNMEFLFNKTGWAFAALCFVLAMTSGQMWNHIR 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD eccccccchhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccceeec 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMM . . . 

SEQ GPPYAHKNPHTGHVNYIHGSSQAQFVAETHIVLLFNGGVTLGMVLLCEAATSDMDIGKRK 

SEG 

PRD ccccccccccccceeeecccchhhhhhhheeeeeeccchhhhhhhhhhhhcccccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ IMCVAGIGLVVLFFSWMLSIFRSKYHGYPYSFLMS 

SEG 

PRD eeeecccceeeeeehhhhhhhhhhccccccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphfbr2 2kl4.3 



PS00001 71->75 ASN_GLYCOSYLATION PDOC00001 

PS00001 215->219 ASN_GLYCOSYLATION PDOC00001 

PS00005 38->41 PKC_PHOSPHO_SITE PDOC00005 

PS00005 48->51 PKC PHOSPHO SITE PDOC00005 
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PS00005 


103 


->106 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


111- 


->114 


PKC~PHOSPHO 


"site 


PDOC00005 


PS00006 


208 


->212 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


292 


->296 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


193 


->199 


MYRISTYL 




PDOC00008 


PS00008 


233 


->239 


MYRISTYL 




PDOC00O08 


PS00008 


259 


->265 


MYRISTYL 




PDOC00008 


PS00008 


278 


->284 


MYRISTYL 




PDOC00008 


PS00009 


296 


->300 


AMI DAT I ON 




PDOC00009 


PS00016 


150 


->153 


RGD 




PDOC00016 



(No Pfara data available for DKFZphfbr2_2kl4 . 3) 



243 



WO 01/12659 



PCT/IB00/01496 



DKFZphfbr2_3cl8 



group: nucleic acid management 

DKFZphfbr2_3cl8 encodes a novel 448 amino acid protein with strong similarity to mus musculus 
RNA helicase and several RNA-dependent ATPases from the DEAD box family. 

RNA helicases comprise a large family of proteins that are involved in basic biological 
systems such as nuclear and mitochondrial splicing processes, RNA editing, rRNA processing, 
translation initiation, nuclear mRNA export, and mRNA degradation. RNA helicases are essential 
factors in cell development and differentiation, and some of them play a role in transcription 
and replication of viral single-stranded RNA genomes. The members of the largest subgroup, the 
DEAD and DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. The novel protein contains a DEAD-box and is a new member of this subgroup. 

The new protein can find application in modulating RNA metabolism and gene expression. 



strong similarity to RNA helicase and RNA-dependent ATPase 
from the DEAD box family 
group helicases 

Summary DKFZphfbr2_ 3cl8 encodes a novel 448 amino acid protein with 
similarity to DEAD-box subfamily ATP-dependent RNA helicases. 
Deletion of the yeast homolouge DBP5 is lethal. 



strong similarity to RNA helicase and RNA-dependent ATPase from the 
DEAD box family 

complete cDNA, EST hits 
complete cds ATG at Bp 109 

Sequenced by AGOWA 

Locus: /map«"87.50 cR from top of Chrl6 linkage group" 
Insert length: 1713 bp 

Poly A stretch at pos. 1696, no polyadenylation signal found 



* 1 TGGGGTAGTG GGGCTGGAGC 

51 ATCCCTCGTG CCATCCCTCG 

101 CTGGGACCAT GGCCACTGAC 

151 GCTGCGGCTG AGTCGTTGAG 

201 ACCAGATACC AATGGTGCTG 

251 CAGATGAAGA AGAGAAAGAG 

301 CTGATCAGAA GCAACCTTGT 

351 GCGGGATCCA AACTCCCCTC 

401 GGCTCCCACA GAACTTAATT 

451 GCTGCCTTCG TGCTGGCCAT 

501 CCCCCAGTGT CTATGTCTCT 

551 GAAAAGTGAT TGAACAAATG 

601 TATGCTGTTC GAGGCAATAA 

651 GATTGTCATT GGCACCCCTG 

701 AGTTCATTGA TCCCAAGAAA 

751 GTCATGATAG CCACTCAGGG 

801' GATGCTGCCC AGGAACTGCC 

851 ACTCTGTGTG GAAGTTTGCC 

901 AAACTGAAGC GTGAGGAAGA 

951 CCTGTGCAGC AGCAGAGACG 

1001 GGGCCATCAC CATTGCTCAA 

1051 GCTAGTTGGC TGGCAGCAGA 

1101 GCTGAGTGGG GAGATGATGG 

1151 TCCGAGAGGG CAAAGAGAAG 

1201 GGCATTGATG TTGAACAAGT 

1251 GGACAAGGAC GGGAATCCTG 

1301 GCACGGGCCG CTTTGGCAAG 

1351 AAGCACAGCA TGAACATCCT 

1401 GATAGAAAGA TTGGACACAG 

1451 ACTGAGAAGC TCCACCAGCC 

1501 CAGGAGACAA GTGCGTTCAG 

1551 ACGGCACAAG TAGAGAGAAA 

1601 CTTGACAAAA ATGTATGCAA 

1651 ACACAACCTT GGAAGATTAG 

1701 AAAAAAAAAA AAA 



AGAGCCTGCC GCGAACCCCC GGAGCCCACG 

AATCCACCAG CACGAGCGTC CCACCCGCGC 

TCATGGGCCC TGGCGGTGGA CGAGCAGGAA 

CAACTTGCAT CTTAAGGAAG AGAAAATCAA 

TTGTCAAGAC CAATGCCAAT GCAGAGAAGA 

GACAGAGCTG CCCAGTCCTT ACTCAACAAG 

TGATAACACA AACCAAGTGG AAGTCCTGCA 

TGTACTCGGT GAAGTCTTTT GAAGAGCTTC 

GCCCAATCTC AGTCTGGTAC TGGTAAAACA 

GCTTAGCCAA GTAGAACCTG CAAACAAATA 

CCCCAACGTA TGAGCTCGCC CTCCAAACAG 
GGCAAATTTT ACCCTGAACT * GAAGCTAGCT 

ATTGGAAAGA GGCCAGAAGA TCAGTGAGCA 

GGACTGTGCT GGACTGGTGC TCCAAGCTCA 

ATCAAGGTGT TTGTTCTGGA TGAGGCTGAT 

CCACCAAGAT CAGAGCATCC GCATCCAGAG 

AGATGCTGCT TTTCTCCGCC ACCTTTGAAG 

CAGAAAGTGG TCCCAGACCC AAACGTTATC 

GACCCTGGAC ACCATCAAGC AGTACTATGT 

AGAAGTTCCA GGCCTTGTGT AACCTCTACG 

GCCATGATCT TCTGCCATAC TCGCAAAACA 

GCTCTCAAAA GAAGGCCACC AGGTGGCTCT 

TGGAACAGAG GGCTGCAGTG ATTGAGCGCT 

GTTTTGGTGA CCACCAACGT GTGTGCCCGC 

GTCTGTCGTC ATCAACTTTG ATCTTCCCGT 

ACAATGAGAC CTACCTGCAC CGGATCGGGC 

AGGGGCCTGG CAGTGAACAT GGTGGACAGC 

GAACAGAATC CAGGAGCATT TTAATAAGAA 

ATGATTTGGA CGAGATTGAG AAAATAGCCA 

ACTGATGCCA GCCCTGGCAC TGCCCCTGCA 

GGCACAGGCC CCGACATCAC CCCAAGGACA 

CTACCTACCT CACTTCAAAT TATGTTTGGA 

ATGATGGGGG ATGGTAGAAA AAAATTATTT 

GCATGAATAC AC AG AG AT TT ACCTTTAAAA 



BLAST Results 
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Entry G36496 from database EMBL: 
SHGC-53094 Human Homo sapiens STS cDNA. 
Length » 459 
Minus Strand HSPs: 

Score " 1693 (254.0 bits), Expect « 2.8e-70, P = 2.8e-70 
Identities = 369/387 (95%), Positives = 369/387 (95%) 

Entry G44014 from database EMBLNEW: 

WIAF-3643-STS Human THudson SANGER Homo sapiens STS genomic, sequence 
tagged site. 

Score - 901, P = 2.3e-35, identities « 183/185 



Medline entries 



94192995: 

Gene 1994 Mar 25; 140 (2) : 171-177 

Mouse erythroid cells express multiple putative RNA helicase genes 
exhibiting 

high sequence conservation from yeast to mammals. 



Peptide information for frame 1 



ORF from 109 bp to 1452 bp; peptide length: 448 
Category: strong similarity to known protein 



1 MATDSWALAV DEQEAAAESL SNLHLKEEKI KPDTNGAVVK TNANAEKTDE 
51 EEKEDRAAQS LLNKLIRSNL VDNTNQVEVL QRDPNSPLYS VKSFEELRLP 
101 QNLIAQSQSG TGKTAAFVLA MLSQVEPANK YPQCLCLSPT YELALQTGKV 
151 IEQMGKFYPE LKLAYAVRGN KLERGQKISE QIVIGTPGTV LDWCSKLKFI 
201 DPKKIKVFVL DEADVMIATQ GHQDQSIRIQ RMLPRNCQML LFSATFEDSV 
251 WKFAQKVVPD PNVIKLKREE ETLDTIKQYY VLCSSRDEKF QALCNLYGAI 
301 TIAQAMIFCH TRKTASWLAA ELSKEGHQVA LLSGEMMVEQ RAAVIERFRE 
351 GKEKVLVTTN VCARGIDVEQ VSWINFDLP VDKDGNPDNE TYLHRIGRTG 
401 RFGKRGLAVN MVDSKHSMNI LNRIQEHFNK KIERLDTDDL DEIEKIAN 

BLAST P hits 

Ko BLAST P hits available 

Alert BLASTP hits for DKFZphfbr2_3cl8, frame 1 

PIR: 149731 RNA helicase - mouse, N - 2, Score - 1758, P - 3.8e-223 

TREMBL:AF005239_1 gene: "Dbp80"; product: " DEAD-box helicase"; 
Drosophila melanogaster DEAD-box helicase (Dbp80) mRNA, complete cds., 
N = 2, Score = 1142, P - l.Be-125 

SWISSPROT:YB66_SCHPO PUTATIVE ATP-DEPENDENT RNA HELICASE C12C2.06., N = 
2, Score = 911, P - 5.5e-103 

PIR:S66920 probable RNA helicase CA5/6 - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 887, P = 1.9e-98 



>PIR: 149731 RNA helicase - mouse 
Length = 478 

HSPs: 

Score = 1758 (263.8 bits), Expect = 3.8e-223, Sura P(2) - 3.8e-223 
Identities » 338/349 (96%), Positives = 349/349 (100%) 

Query: 100 PQNLIAQSQSGTGKTAAFVLAMLSQVEPANKYPQCLCLSPTYELALQTGKVIEQMGKFYP 159 

PQNLIAQSQSGTGKTAAFVLAMLS+VEPA++YPQCLCLSPTYELALQTGKVIEQMGKF+P 
Sbjct: 130 PQNLIAQSQSGTGKTAAFVLAMLSRVEPADRYPQCLCLSPTYELALQTGKVIEQMGKFHP 189 

Query: 160 ELKLAYAVRGNKLERGQKISEQIVIGTPGTVLDWCSKLKFIDPKKIKVFVLDEADVMIAT 219 

ELKLAYAVRGNKLERGQK+SEQIVIGTPGTVLDWCSKLKFIDPKKIKVFVLDEADVMIAT 
Sbjct: 190 ELKLAYAVRGNKLERGQKVSEQIVIGTPGTVLDWCSKLKFIDPKKIKVFVLDEADVMIAT 249 

Query: 220 QGHQDQSIRIQRMLPRNCQMLLFSATFEDSVWKFAQKVVPDPNVIKLKREEETLDTIKQY 279 
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QGHQDQSIRIQR++PRNCQMLLFSATFEDSVWKFAQKVVPDPN+IKLKREEETLDTIKQY 




Sbjct : 


250 




309 


Query: 


280 


YVLCSSRDEKFQALCNLYGAITIAQAMIFCHTRKTASWLAAELSKEGHQVALLSGEMMVE 


339 






YVLC++R+EKFQALCNLYGAITIAQAMIFCHTRKTASWLAAELSKEGHQVALLSGEMMVE 




Sbjct : 


310 


YVLCNNREEKFOALCNLYGAITIAOAMIFCHTRKTASWIJVAFL^KFGHfJVAT T SRFMMVF 


369 


Query: 


340 


QRAAVIERFREGKEKVLVTTNVCARGIDVEQVSVVINFDLPVDKDGNPDNETYLHRIGRT 


399 






QRAAVIERFREGKEKVLVTTNVCARGIDVEQVSVVINFDLPVDKDGNPDNETYLHRIGRT 




Sbjct: 


370 


QRAAVIERFREGKEKVLVTTNVCARGIDVEQVSWINFDLPVDKDGNPDNETYLHRIGRT 


429 


Query: 


400 


GRFGKRGLAVNMVDSKHSMNILNRIQEHFNKKIERLDTDDLDEIEKIAN 448 








GRFGKRGLAVNMVDSKHSMNILNRIQEHFNKKIERLDTDDLDEIEKIAN 




Sbjct: 


430 


GRFGKRGLAVNMVDSKHSMNILNRIQEHFNKKIERLDTDDLDEIEKIAN 478 




Score 


= 419 


(62.9 bits), Expect - 3.8e-223, Sum P(2) « 3.8e-223 




Identities = 94/136 (69%), Positives = 104/136 (76%) 




Query: 


1 


MATDSWALAVDEQEAAAESLSNLHLKEEKIKPDTNGAVVKTNANAEKTDEEEKEDRAAQS 


60 






MATDSWALAVDEQEAA +S+S+L +KEEK K DTNG V+KT+ AEKT+EEEKEDRAAQS 




Sbjct: 


1 


MATDSWALAVDEQEAAVKSMSSLQIKEEKAKSDTNG-VIKTSTTAEKTEEEEKEDRAAQS 


59 


Query: 


61 


LLNKLIRSNLVDNTNQVEVLQRDPNSPLYSVKSFEELRL-PQNL— IAQSQSGTGKTAA 


116 






LLNKLIRSNLVDNTNQVEVLQRDP+SPLYSVKSFEELRL PQ L A + K 




Sbjct: 


60 


LLNKLIRSNLVDNTNQVEVLQRDPSSPLYSVKSFEELRLKPQLLQGVYAMGFNRPSKIQE 


119 


Query: 


117 


FVLAMLSQVEPANKYPQ 133 








L M+ P N Q 




Sbjct: 


120 


NALPMMLAEPPQNLIAQ 136 





Pedant information for DKFZphfbr2_3cl8, frame 1 



Report for DKFZphfbr2_3cl8 . 1 



[LENGTH] 


448 






(MW) 


50490.07 






[pi] 


5.83 






[HOMOL] 


PIR: 149731 RNA helicase - mouse 0.0 






[ FUNCAT ] 


98 classification not yet clear-cut [S. cerevisiae, YOR046c] 


le 


-102 


[FUNCAT] 


04.01.04 rrna processing [S. cerevisiae, YDR021w] 2e-65 






[FUNCAT] 


30.10 nuclear organization [S. cerevisiae, YDR021w] 2e-65 






[FUNCAT J 


30.03 organization of cytoplasm [S. cerevisiae, YJL138c] 


le 


-63 


[FUNCAT] 


05.04 translation (initiation, elongation and termination) [S 


. cerevisiae, 


YJL138C] le 


-63 






[FUNCAT] 


04.99 other transcription activities [S. cerevisiae, YDL160c] 


2e 


-49 


[ FUNCAT ] 


j mrna translation and ribosome biogenesis (H. influenzae, HI0231 RNA} 9e 


[FUNCAT] 


04.05.03 mrna processing (splicing) [S. cerevisiae, YDL084w] 


le 


-43 


[FUNCAT] 


1 genome replication, transcription, recombination and repair 




[H. 


influenzae, 


HI0892] 3e-39 






[FUNCAT] 


06.10 assembly of protein complexes [S. cerevisiae, YLLOOSw] 


le 


-35 


[FUNCAT] 


09.01 biogenesis of cell wall [S. cerevisiae, YJL033w] 


9e 


-27 


[FUNCAT] 


04.05.01.07 chromatin modification [S. cerevisiae, YMR290c] 


8e 


-26 


[FUNCAT] 


30. 16 mitochondrial organization [S. cerevisiae, YDRl94c] 


le 


-23 


[FUNCAT] 


r general function prediction [M. jannaschii, MJ1401] 9e- 


08 


[FUNCAT] 


11.10 cell death [S. cerevisiae, YMRl90c] le-05 






[FUNCAT] 


03.19 recombination and dna repair [S. cerevisiae, YMR190c] 


le 


-05 


[FUNCAT] 


99 unclassified proteins [S. cerevisiae, YIR002c] 7e-04 






[ BLOCKS ) 


BL00039D DEAD-box subfamily ATP-dependent helicases proteins 






[BLOCKS] 


BL00039C DEAD-box subfamily ATP-dependent helicases proteins 






[BLOCKS] 


BL00039B DEAD-box subfamily ATP-dependent helicases proteins 






[BLOCKS] 


BL00039A DEAD-box subfamily ATP-dependent helicases proteins 






[PIRKW] 


nucleus 4e-64 






[PIRKW] 


RNA binding le-64 






[PIRKW] 


DEAD box 4e-64 






[PIRKW] 


transmembrane protein 3e-22 






[PIRKW] 


DNA binding 2e-32 






[PIRKW] 


ATP le-101 






[PIRKW] 


purine nucleotide binding 4e-64 






[PIRKW] 


P-loop le-101 






[PIRKW] 


hydrolase 4e-43 






[PIRKW] 


protein biosynthesis le-64 






[PIRKW] 


ATP binding 2e-35 






[SUPFAM] 


WW repeat homology 3e-29 






[SUPFAM] 


translation initiation factor eIF-4A le-64 






[SUPFAM] 


DEAD/K box helicase homology le-101 






[SUPFAM] 


DNA helicase recG 2e-06 






[SUPFAM] 


unassigned DEAD/H box helicases le-101 






[SUPFAM] 


ATP-dependent RNA helicase DBP1 9e-33 
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[SUPFAM] 


ATP-dependent RNA 


helicase DHHl 4e-48 


(SOPFAMJ 


tobacco ATP-dependent 


RNA helicase DB10 3e-29 


(PROSITE] 


MYRISTYL 5 






(PROSITE) 


AMI DAT I ON 1 






t PROSITE] 


CK2 PHOSPHO SITE 




6 


(PROSITE) 


GLYCOSAMINOGLYCAN 




1 


t PROSITE] 


PKC PHOSPHO SITE 




8 


[PROSITE] 


ASN GLYCOSYLATION 




1 


fPFAM] 


Helica3es conserved C- 


-terminal domain 


[PFAM] 


DEAD and DEAH box 


helicases 


tKW] 


Alpha_Beta 







SEQ MATDSWALAVDEQEAAAESLSNLHLKEEKIKPDTNGAVVKTNANAEKTDEEEKEDRAAQS 

PRD ccchhhhhhhhhhhhhhhhcccchhhhhhhcccccceeeeeehhhhhhhhhhhhhhhhhh 

SEQ LLNKLIRSNLVDNTNQVEVLQRDPNSPLYSVKSFEELRLPQNLIAQSQSGTGKTAAFVLA 

PRD hhhhhhhhhcccccceeeeeeccccccceeehhhhhhhhccceeeeeccccccchhhhhh 

SEQ MLSQVEPANKYPQCLCLSPTYELALQTGKVIEQMGKFYPELKLAYAVRGNKLERGQKISE 

PRD hhhhhhhhhccceeeeeccchhhhhhhhhhhhhhccccccccceeeccccchhhhhhhhe 

SEQ QIVIGTPGTVLDWCSKLKFIDPKKIKVFVLDEADVMIATQGHQDQSIRIQRMLPRNCQML 

PRD eeeecccccchhhhhhhhhhcccceeeeeecchhhhhhhccchhhhhhhhhhccccceee 

SEQ LFSATFEDSVWKFAQKVVPDPNVIKLKREEETLDTI KQYYVLCSSRDEKFQALCNLYGAI 

PRD eeeccccchhhhhhhhhhcccceeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhch 

SEQ TIAQAMIFCHTRKTASWLAAELSKEGHQVALLSGEMMVEQRAAVIERFREGKEKVLVTTN 

PRD hhhhhheeecchhhhhhhhhhhhhccceeeeecccchhhhhhhhhhhhccccceeeeeec 

SEQ VCARGIDVEQVSVVINFDLPVDKDGNPDNETYLHRIGRTGRFGKRGLAVNMVDSKHSMNI 

PRD ccccccceeeeeeeeecccccccccccccceeeeeecccccccccceeeeeeeccchhhh 

SEQ LNRIQEHFNKKIERLDTDDLDEIEKIAN 

PRD hhhhhhhhhhhccccccccchhhhhccc 



Prosite for DKFZphf br2_3cl8 . 1 



PS00001 


389->393 


ASN GLYCOSYLATION 


PDOC00001 


PS00002 


109->113 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00005 


90->93 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


111->114 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


147->150 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


226->229 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


275->278 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


284->287 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


311->314 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


399->402 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


48->52 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


93->97 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


123->127 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


189->193 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


245->249 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


284->288 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


110->116 


MYRISTYL 




PDOC00008 


PS00008 


175->181 


MYRISTYL 




PDOC00008 


PS00008 


185->191 


MYRISTYL 




PDOC00008 


PS00008 


385->391 


MYRISTYL 




PDOC00008 


PS00008 


406->412 


MYRISTYL 




PDOC00008 


PS00009 


402->406 


AMI DAT ION 




PDOC00009 



Pfam for DKFZphf br2_3cl8 . 1 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 



DEAD and DEAH box helicases 



65 



* g LpPWI LRn IyeMG FEkPTPI QQqA IPilLeG RDVMACAQTGSG K 

++ ++ +N ++ P E+ +++A++Q+G+GK 

LIRSNLVDNTNQVEVLQRDPNSPLYSVKSFEELRLPQNLIAQSQSGTGK 113 



TAAFlIPMLQHIDwdPWpqpPQdPrALILAPTRELAMQIQEEcRkFgkHM 
TAAF++ ML+++ + + PQ +L L+PT ELA+Q+ ++++++GK++ 
114 TAAFVLAMLSQVEPAN — KYPQ CLCLSPTYELALQTGKVIEQMGKFY 158 

ngl RImcI YGGtnMRdQMRmLeRGpPHI VIATPGRLI DHIER . gtldLDr 
++++++ ++ +++ +++ +IVI+TPG ++D + +D ++ 
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Query 

HMM 

Query 

HMM 
Query 



159 PELKLAYAVR GNKLERGQKISEQIVIGTPGTVLDWCSKLKFIDPKK 204 

IeMLVMDEADRMLD.MGFIDQIRrlMrqlPMpwNRQTMMFSATMPdelqE 
I+++V+DEAD M+ +G +DQ RI R++P +N Q ++FSAT+ D++ + 
205 1 KVFVLDEADVMI ATQGHQDQS I RI QRMLP- - RNCQMLL FSAT FEDSVWK 252 

LARrFMRNPIRInldMdElTtnEnlkQwYiyVerEMWKfdcLcrLIe* 
+A ++ +P I ++++E T++ +IKQ+Y+ + + ++KF +LC+L++ 
253 FAQKVVPDPNVIKLKREEETLD-TIKQYYVLCSSRDEKFQALCNLYG 298 



HMM_NAME Helicases conserved C- terminal domain 

HMM *EileeWLknlGIrvmYIHGdMpQeERdeIMddFNnGEynVLIcTDVggR 
+L+ +L+++G +V+ + G M+ E+R ++++F++G+ +VL++T+V +R 
Query 316 SWLAAELSKEGHQVALLSGEMMVEQRAAVIERFREGKEKVLVTTNVCAR 364 

HMM GIDIPdVNHVINYDM PWNPEq. . YIQRIGRTgRIG* 

GID+++V++VIN+D+ + NP++ Y++RIGRTGR+G 
Query 365 GIDVEQVSVVINFDLPVDKDGNPDNETYLHRIGRTGRFG 403 



Medline 

PMID: 10322435 
"Unwinding RNA in 
P 



DEAD-box proteins and related families." de la Cruz J, Kressler D, Linder 
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DKFZphfbr2_3fl6 



group: brain derived 

DKFZphfbr2_3f 16 encodes a novel 127 amino acid protein without similarity to known proteii 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 1514 bp 

Poly A stretch at pos . 1454, polyadenylation signal at pos. 1434 



1 GGGGGGACTG GAGAAGGGAG GCGGCGGGCG AAGCGCACGT CGAGCGGGGG 

51 AGCGGCGCTG CCTGTGGAGA TCCGCGGAGG CCGACAGGAT TCGTTGGCTG 

101 CCGTCCCCGC TGCTGTGCAT TGGGTTAAAA ACGACAACCA ACATCAGCCA 

151 TGAAAGATCC AAGTCGCAGC AGTACTAGCC CAAGCATCAT CAATGAAGAT 

201 GTGATTATTA ACGGTCATTC TCATGAAGAT GACAATCCAT TTGCAGAGTA 

251 CATGTGGATG GAAAATGAAG AAGAATTCAA CAGACAAATA GAAGAGGAGT 

301 TATGGGAAGA AGAATTTATT GAACGCTGTT TCCAAGAAAT GCTGGAAGAG 

351 GAAGAAGAGC ATGAATGGTT TATTCCAGCT CGAGATCTCC CACAAACTAT 

401 GGACCAAATC CAAGACCAGT TTAATGACCT TGTTATCAGT GAAGGCTCTT 

451 CTCTGGAAGA TCTTGTGGTC AAGAGCAATC TGAATCCAAA TGCAAAGGAG 

501 TTTGTTCCTG GGGTGAAGTA CGGAAATATT TGAGTAGACG GGGCCCTCTT 

551 TTGGTGGATG TAGCACAATT TCCACACTGT GAAGGCAGTA TTAGAAGACT 

601 TAATTGTAAA AGCACTCTTG TCACTGTGTT ACACTTATGC ATTGCCAAAG 

651 TTTTTGTTAG TCTTGCATGC TTAATAAAAG TGCTGAGACT GTTACTAAGT 

701 AAAAAGCTGT CAAACATTTA CTGAAAATAG AATTGGCCCC ATGGCTTGAT 

751 GTGAAGACAG CAAGGAAAGA AGCACCAGTC AAGTTGTGAA CAAGCACCAA 

801 ATTAAAAGAC CTAAACCTTA CCAAATTGTC TTTTTTTGAG GCTAATCTAT 

851 CACTTGTTAA TGTCTAAACT TTAAAATCAG TACATTTAAT TTGAGTTCCA 

901 ACTGTTAAGC ATATTTCTCA GACTTAAATT TGATTATGTC CCCATCAAAA 

951 AGAATCTCCA TTTTCTGAAG GTCTGTTAGT TAATTTGAGA TAATTTGTTA 

1001 AAGGCAAGTA TGTCATATTA CTGAGGCTAC AAGTTAGTCA GCAGATGAGT 

1051 GCCAGTCCAG CCTTTTCCGG TATGTTATTG TTAGAAATAT TGAGTTCTAA 

1101 TGTTACATCT GAGGAAGTAT GTAATTTGAG AATTGTAACT TCTAAGGGAT 

1151 TCACTGCATC ATAGCTATGC CTGTATGGAG TCTAACATAT GACCAATACC 

1201 AACCCATAAT CCAGCTGAAC AAAGATACTG TAACATTATG ATTTGAGTGG 

1251 TGCTTTTCCT TGCTTTGTTA ACCATCACGA GAGTCTGCAG CACAACTTTT 

1301 AACAAAGCTA GAACAGTTTT GGCTTCTTAA ACTTCATATT TGGGTAGGTT 

1351 AAGCTGCCAT ACGTGTTCAG TGTGAATAGT GTTTAAGTTG AAAATATTGT 

1401 AAAAAAATTA TATTTTTTCA AAAATATTTA AAAAAATAAA TAATAGTAGA 

1451 ACTGAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAGAAAAA 

1501 AAAAAAAAAA AAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 150 bp to 530 bp; peptide length: 127 
Category: putative protein 



1 MKDPSRSSTS PSIINEDVII NGHSHEDDNP FAEYMWMENE EEFNRQIEEE 
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51 LWEEEFIERC FQEMLEEEEE HEW FI PAROL PQTMDQIQDQ FNDLVISEGS 
101 SLEDLVVKSN LNPNAKEFVP GVKYGNI 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_3f 16, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_3f 16, frame 3 

Report for DKF2phfbr2_3f 16. 3 



[LENGTH] 127 

[MW] 14998.41 

[pi] 4.04 

[BLOCKS] BL01269D 

[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 2 

[KW] Alpha_Beta 

[KW) LOW_COMPLEXITY 27.56 % 



SEQ MKDPSRSSTSPSIINEDVIINGHSHEDDNPFAEYMWMENEEEFNRQIEEELWEEEFIERC 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccceeeecccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ FQEMLEEEEEHEWFI PARDLPQTMDQIQDQFNDLVISEGSSLEDLVVKSNLNPNAKEFVP 

SEG xxxxxxxxxxxx 

PRD* hhhhhhhhhhhhhccccccccchhhhhhhhhcceeeecccccceeeeecccccccccccc 

SEQ GVKYGNI 

SEG 

PRD ccccccc 



Prosite for DKFZph£br2_3f 16 . 3 

PS00006 24->28 CK2_PHOSPHO_SITE PDOC00006 

PS00006 100->104 CK2_PHOSPHO_SITE PDOC00006 

PS00008 121->127 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphfbr2_3f 16. 3) 
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DKFZphfbr2_3g8 



group: metabolism 

DKFZphfbr2_3g8.1 encodes a novel 178 amino acid protein with similarity to yeast ARD1 protein. 

In yeast, ARDl and NATl , are required for the expression of an N-terminal protein 
acetyltransferase 1. NATl controls full repression of the silent mating type locus HML, 
sporulation and entry into GO. ARDl is involved in the assembly of the NAT 1-complex. The new 
protein could be part of this or an other NAT complex. 

The new protein can find application modulating NAT assembly and action and therefore be 
important in metabolism of drugs and environmental mutagens. 

strong similarity to N-TERMINAL ACETYLTRANSFERASE COMPLEX ARDl homolog 
complete cDNA, complete cds? start at Bp 40, EST hits 
Sequenced by AGOWA 
Locus: /map-'^O" 
Insert length: 1030 bp 

Poly A stretch at pos . 1013, no polyadenylation signal found 

1 TGGGCTTGGC GAACGGTCTT CGGAAGCGGC GGCGGCGCGA TGACCACGCT 
51 ACGGGCCTTT ACCTGCGACG ACCTGTTCCG CTTCAACAAC ATTAACTTGG 
101 ATCCACTTAC AGAAACTTAT GGGATTCCTT TCTACCTACA ATACCTCGCC 
151 CACTGGCCAG AGTATTTCAT TGTTGCAGTG GCACCTGGTG GAGAATTAAT 
201 GGGTTATATT ATGGGTAAAG CAGAAGGCTC AGTAGCTAGG GAAGAATGGC 
251 ACGGGCACGT CACAGCTCTG TCTGTTGCCC CAGAATTTCG ACGCCTTGGT 
301 TTGGCTGCTA AACTTATGGA GTTACTAGAG GAGATTTCAG AAAGAAAGGG 
351 TGGGTTTTTT GTGGATCTCT TTGTAAGAGT ATCTAACCAA GTTGCAGTTA 
401 ACATGTACAA GCAGTTGGGC TACAGTGTAT ATAGGACGGT CATAGAGTAC 
451 TATTCGGCCA GCAACGGGGA GCCTGATGAG GACGCTTATG ATATGAGGAA 
501 AGCACTTTCC AGGGATACTG AGAAGAAATC CATCATACCA TTACCTCATC 
551 CTGTGAGGCC TGAAGACATT GAATAACCCT GGGCAGTGGT TCTTAGGCAG 
601 ATACTCTAGA TGCTTTATGG ACAATATTAT TTTCATTGGA TGATTCTGGA 
651 GCTCTATTAG GAGAAAAGTA ATCATTTTAG GTCTTAAAGA CTTCAAGAAA 
701 ATACAGGTTA TCAATTTATT TTAAATCTCA TTGTTTCCAG TTAGCAATAT 
751 CATACCTATT AAAGCTGTTC ATTGTAACAA AATTCAATCA AAAAGGCAGC 
801 TAGGTCAGAA GGAAACATAC CACTCTCATG GTTCATAGTA TTCACTGTAT 
851 GTATGCTAGG GAAAAGACTT GCTCCAGTCT CCTCCTCAGT TCTGTGCCTG 
901 AGAACCACTG CTGCATATAT TTGTTTTTAA ATTTTGTATT GAACTGTTAA 
951 TTGAAGCTTT AAAAGCATAT ATGAAATGTA TAAATCTAAG ATGTATAATA 
1001 CATTATTGAC TCCAAAAAAA AAAAAAAAAA 



BLAST Results 



Entry HSG0101 from database EMBL: 
human STS SHGC-35956. 
Length - 401 
Minus Strand HSPs: 

Score = 1417 {212.6 bits), Expect = 9.3e-58, P = 9.3e-58 
Identities « 301/311 (96%) 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 40 bp to 573 bp; peptide length: 178 
Category: strong similarity to known protein 



1 MTTLRAFTCD DLFRFNNINL DPLTETYGIP FYLQYLAHWP EYFIVAVAPG 
51 GELMGYIMGK AEGSVAREEW HGHVTALSVA PEFRRLGLAA KLMELLEEIS 
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101 ERKGGFFVDL FVRVSNQVAV NMYKQLGYSV YRTVIEYYSA SNGEPDEDAY 
151 DMRKALSRDT EKKSIIPLPH PVRPEDIE 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_3g8, frame 1 

TREMBL:SPCC16C4_12 gene: "SPCC16C4 . 12"; product: "putative n-terminal 
acetyltransf erase complex subunit"; S.pombe chromosome III cosmid 
cl6C4., N = 1, Score = 475, P « 3.2e-45 

SWISSPROT:ARDH_LEIDO N-TERMINAL ACETYLTRANSFERASE COMPLEX ARD1 SUBUNIT 
HOMOLOG . , N = 1, Score =4 51, P = l.le-42 

PIR:S69021 hypothetical protein YPR131C - yeast (Saccharomyces 
cerevisiae), N = 1, Score - 382, P - 2.3e-35 



>TREMBL:SPCC16C4_12 gene: "SPCC16C4 . 12" ; product: "putative n-terminal 

acetyltransf erase complex subunit"; S.pombe chromosome III cosmid cl6C4. 
Length = 180 

HSPs: 

Score - 475 (71.3 bits), Expect = 3.2e-45, P = 3.2e-45 
Identities = 96/165 (58%), Positives = 118/165 (71%) 

Query: 1 MTTLRAFTCDDLFRFNNINLDPLTETYGIPFYLQYLAHWPEYFIVAVAPGGE--LMGYIM 58 

MT R F DLF FNNINLDPLTET+ I FYL YL WP +V + + LMGYIM 
Sbjct: 1 MTDTRKFKATDLFS FNNINLDPLTETFNISFYLSYLNKWPSLCVVQESDLSDPTLMGYIM 60 

Query: 59 GKAEGSVAREEWHGHVTALSVAPEFRRLGLAAKLMELLEEISERKGGFFVDLFVRVSNQV 118 

GK+EG+ +EWH HVTA++VAP RRLGLA +M+ LE + + FFVDLFVR SN + 
Sbjct: 61 GKSEGT--GKEWHTHVTAITVAPNSRRLGLARTMMDYLETVGNSENAFFVDLFVRASNAL 118 

Query: 119 AVNMYKQLGYSVYRTVIEYYSASNGEPDEDAYDMRKALSRDTEKKSI 165 

A++ YK LGYSVYR VI YYS +G+ DED++DMRK LSRD ++SI 
Sbjct: 119 AIDFYKGLGYSVYRRVIGYYSNPHGK-DEDSFDMRKPLSRDVNRESI 164 



Pedant information for DKF2phfbr2_3g8, frame 1 



Report for DKFZphfbr2_3g8 . 1 



[LENGTH] 178 

[MW] 20338.24 

[pi] 5.06 

[HOMOL] TREMBL:SPCC16C4_12 gene: "SPCC16C4 .12"; product: "putative n-terminal 

acetyltransf erase complex subunit"; S.pombe chromosome III cosmid cl6C4. 7e-47 



[FUNCATJ 
palmitylation, 
[ FUNCAT ) 
4e-14 
[FUNCATJ 
[FUNCAT] 
[FUNCAT] 
[PIRKW] 
[SUPFAM] 
[SUPFAM] 
[PROSITE] 
[PROSITE] 
[KW] 



06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) [S. cerevisiae, YPR131c) 6e-37 

01.06.07 lipid, fatty-acid and sterol utilization [S. cerevisiae, YHR013c] 



[S. cerevisiae, YHR013C] 4e-14 
(S. cerevisiae, YHR0l3c] 4e-14 
[M. jannaschii, MJ1530] 6e-09 



30.03 organization of cytoplasm 
03.22 cell cycle control and mitosis 
r general function prediction 
acyltransferase le-12 
arrest-defective protein 1 le-12 

Escherichia coli peptide N-acetyltransferase riml le-07 
CK2_PHOSPHO_SITE 3 
PKC_PHOSPHO_SITE 3 
Alpha_Beta 



SEQ MTTLRAFTCDDLFRFNNINLDPLTETYGIPFYLQYLAHWPEYFIVAVAPGGELMGYIMGK 

PRD ccccccccccchhhhhhcccccccccccchhhhhhcccccceeeeeeccccceeeehhhh 

SEQ AEGSVAREEWHGHVTALSVAPEFRRLGLAAKLMELLEEISERKGGFFVDLFVRVSNQVAV 

PRD hcccccccccccceeeeehhhhhhhhcchhhhhhhhhhhhhhccceeeeeeeecchhhhh 

SEQ NMYKQLGYSVYRTVIEYYSASNGEPDEDAYDMRKALSRDTEKKSIIPLPHPVRPEDIE 

PRD hhhhhhcccchhhhhhccccccccccchhhhhhhhhhhhhhhhhcccccccccccccc 



Prosite for DKFZphfbr2_3g8 . 1 
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PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 



3->6 
100->103 
160->163 
8->12 
133->137 
141->145 



PKC PHOSPHO_SITE 
PKC~PHOS PHO_S I TE 
PKC_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2 PHOSPHO SITE 



PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 



(No Pfam data available for DKFZphfbr2__3g8 . 1) 
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DKFZphfbr2_312 



group: brain derived 

DKFZphfbr2_3l2 encodes a novel 589 amino acid protein with weak similarity to S. cerevisiae 
ubiquitin-like protein DSK2. 

Pfam predicts for this protein similarity to the ubiquitin family; No informative BLAST 
results; No predictive prosite or SCOP motive 

The new protein can find application in studying the expression profile of brain-specific 
genes. 



similarity to ubiquitin-like protein DSK2 yeast 
complete cDNA, complete cds, EST hits 

Dsk2p is involved in spindel pole body SPB duplication, SPB - centomer 
strong similarity to HRIHFB2157 human mRNA 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 2978 bp 

Poly A stretch at pos. 2958, polyadenylation signal at pos. 2924 



1 GGGGGGAGGA AGCGGTGGCT GCTGCGGATG TCGGTGTGAG CGAGCGGCGC 
51 CTGAACACAC GGCGGCTGCC GAGCGCCTGA CCCGGGCCTG CGCCAGAGCC 
101 TGCACCGAGC TCCGGGGCCC CACACCCGCT ACGGTGGCCC TGCGCCCGTT 
151 GCTACTGAGG CGGCGTGCTC TGCATTCTTC GCTGTCCAGG CCTGCCGGCT 
201 CTGGTGTCTG CTGGCTCCTC CTTGCTCGCC TGCTCCCTCC TGCTTGCCTG 
251 AGTCACCGCC GCCGCCGCCG CCACAGCCAT GGCCGAGAGT GGTGAAAGCG 
301 GCGGTCCTCC GGGCTCCCAG GATAGCGCCG CCGGAGCCGA AGGTGCTGGC 
351 GCCCCCGCGG CCGCTGCCTC CGCGGAGCCC AAAATCATGA AAGTCACCGT 
401 GAAGACCCCG AAGGAAAAGG AGGAATTCGC CGTGCCCGAG AATAGCTCCG 
"451 TCCAGCAGTT TAAGGAAGAA ATCTCTAAAC GTTTTAAATC ACATACTGAC 
501 CAACTTGTGT TGATATTTGC TGGAAAAATT TTGAAAGATC AAGATACCTT 
551 GAGTCAGCAT GGAATTCATG ATGGACTTAC TGTTCACCTT GTCATTAAAA 
601 CACAAAACAG GCCTCAGGAT CATTCAGCTC AGCAAACAAA TACAGCTGGA 
651 GGCAATGTTA CT AC AT CATC AACTCCTAAT AGTAACTCTA CATCTGGTTC 
701 TGCTACTAGC AACCCTTTTG GTTTAGGTGG CCTTGGGGGA CTTGCAGGTC 
7 51 TGAGTAGCTT GGGTTTGAAT ACTACCAACT TCTCTGAACT ACAGAGTCAG 
801 ATGCAGCGAC AACTTTTGTC TAACCCTGAA ATGATGGTCC AG AT CAT GG A 
851 AAATCCCTTT GTTCAGAGCA TGCTCTCAAA TCCTGACCTG ATGAGACAGT 
901 TAATTATGGC CAATCCACAA ATGCAGCAGT T GAT AC AG AG AAATCCAGAA 
951 ATTAGTCATA TGTTGAATAA TCCAGATATA ATGAGACAAA CGTTGGAACT 
1001 TGCCAGGAAT CCAGCAATGA TGCAGGAGAT GATGAGGAAC CAGGACCGAG 
1051 CTTTGAGCAA CCTAGAAAGC ATCCCAGGGG GATATAATGC TTTAAGGCGC 
1101 ATGTACACAG ATATTCAGGA ACCAATGCTG AGTGCTGCAC AAGAGCAGTT 
1151 TGGTGGTAAT CCATTTGCTT CCTTGGTGAG CAATACATCC TCTGGTGAAG 
1201 GTAGTCAACC TTCCCGTACA GAAAATAGAG ATCCACTACC CAATCCATGG 
1251 GCTCCACAGA CTTCCCAGAG TTCATCAGCT TCCAGCGGCA CTGCCAGCAC 
1301 TGTGGGTGGC ACTACTGGTA GTACTGCCAG TGGCACTTCT GGGCAGAGTA 
1351 CTACTGCGCC AAATTTGGTG CCTGGAGTAG GAGCTAGTAT GTTCAACACA 
1401 CCAGGAATGC AGAGCTTGTT GCAACAAATA ACTGAAAACC CACAACTGAT 
1451 GCAAAACATG TTGTCTGCCC CCTACATGAG AAGCATGATG CAGTCACTAA 
1501 GCCAGAATCC TGACCTTGCT GCACAGATGA TGCTGAATAA TCCCCTATTT 
1551 GCTGGAAATC CTCAGCTTCA AGAACAAATG AGACAACAGC TCCCAACTTT 
1601 CCTCCAACAA ATGCAGAATC CTGATACACT ATCAGCAATG TCAAACCCTA 
1651 GAGCAATGCA GGCCTTGTTA CAGATTCAGC AGGGTTTACA GACATTAGCA 
1701 ACGGAAGCCC CGGGCCTCAT CCCAGGGTTT ACTCCTGGCT TGGGGGCATT 
1751 AGGAAGCACT GGAGGCTCTT CGGGAACTAA TGGATCTAAC GCCACACCTA 
1801 GTGAAAACAC AAGTCCCACA GCAGGAACCA CTGAACCTGG ACATCAGCAG 
1851 TTTATTCAGC AGATGCTGCA GGCTCTTGCT GGAGTAAATC CTCAGCTACA 
1901 GAATCCAGAA GTCAGATTTC AGCAACAACT GGAACAACTC AGTGCAATGG 
1951 GATTTTTGAA CCGTGAAGCA AACTTGCAAG CTCTAATAGC AACAGGAGGT 
2001 GATATCAATG CAGCTATTGA AAGGTTACTG GGCTCCCAGC CATCATAGCA 
2051 GCATTTCTGT ATCTTGAAAA AATGTAATTT ATTTTTGATA ACGGCTCTTA 
2101 AACTTTAAAA TACCTGCTTT ATTTCATTTT GACTCTTGGA ATTCTGTGCT 
2151 GTTATAAACA AACCCAATAT GATGCATTTT AAGGTGGAGT ACAGTAAGAT 
2201 GTGTGGGTTT TTCTGTATTT TTCTTTTCTG GAACAGTGGG AATTAAGGCT 
2251 ACTGCATGCA TCACTTCTGC ATTTATTGTA ATTTTTTAAA AACATCACCT 
2301 TTTATAGTTG GGTGACCAGA TTTTGTCCTG CATCTGTCCA GTTTATTTGC 
2351 TTTTTAAACA TTAGCCTATG GTAGTAATTT ATGTAGAATA AAAGCATTAA 
2401 AAAGAAGCAA ATCATTTGCA CTCTATAATT TGTGGTACAG TATTGCTTAT 
2451 TGTGACTTTG GCATGCATTT TTGCAAACAA TGCTGTAAGA TTTATACTAC 
2501 TGATAATTTT GTTTTATTTG TATACAATAT AGAGTATGCA CATTTGGGAC 
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2551 TGCATTTCTG GAAACATACT GCAATAGGCT CTCTGAGCAA AACACCTGTA 
2601 ACTAAAAAAG TGAAGATAAG AAAATACTCT TAAAGCTGAG TATTTCCTAA 
2651 TTGTATAGAA TCTTACAGCA TCTTTGACAA ACATCTCCCA GCAAAAGTGC 
2701 CGGTTAGTCA GGTTTGTTGA AAATACAGTA GAAAAGCTGA TTCTGGTTAT 
2751 CTCTTTAAGG ACAATTAATT GTACAGACAC ATAATGTAAC ATTGTCTCAA 
2801 CATTCATTCA CAGATTGACT GTAAATTACC TTAATCTTTG TGCAGACTGA 
2851 AGGAACACTG TAGTATACCC CAAAGTGCAT TTGCCTAGGA CTTCTCAGCT 
2901 TCTCCCATAG GTAGTTTAAC AGGCATTAAA ATTTGTAATT GAAATGTTGC 
2951 TTTCACTCAA AAAAAAAAAA AAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 279 bp to 2045 bp; peptide length: 589 
Category: similarity to known protein 



1 MAESGESGGP PGSQDSAAGA EGAGAPAAAA SAEPKIMKVT VKTPKEKEEF 
51 AVPENSSVQQ FKEEISKRFK SHTDQLVLIF AGKILKDQDT LSQHGIHDGL 
101 TVHLVIKTQN RPQDHSAQQT NTAGGNVTTS STPNSNSTSG SATSNPFGLG 
151 GLGGLAGLSS LGLNTTNFSE LQSQMQRQLL SNPEMMVQIM ENPFVQSMLS 
201 NPDLMRQLIM ANPQMQQLIQ RNPEISHMLN NPDIMRQTLE LARNPAMMQE 
251 MMRNQDRALS NLESIPGGYN ALRRMYTDIQ EPMLSAAQEQ FGGNPFASLV 
301 SNTSSGEGSQ PSRTENRDPL PNPWAPQTSQ SSSASSGTAS TVGGTTGSTA 
351 SGTSGQSTTA PNLVPGVGAS MFNTPGMQSL LQQITENPQL MQNMLSAPYM 
401 RSMMQSLSQN PDLAAQMMLN NPLFAGNPQL QEQMRQQLPT FLQQMQNPDT 
451 LSAMSNPRAM QALLQIQQGL QTLATEAPGL IPGFTPGLGA LGSTGGSSGT 
501 NGSNATPSEN TSPTAGTTEP GHQQFIQQML QALAGVNPQL QNPEVRFQQQ 
551 LEQLSAMGFL NREANLQALI ATGGDINAAI ERLLGSQPS 

BLASTP hits 

Entry CE1_1 from database TREMBL: 

"F15CU.2"; Caenorhabditis elegans cosmid VF15C11L 
Length = 293 

Score = 454 (159.8 bits), Expect = 4.4e-43, P = 4.4e-43 
Identities - 81/162 (50%), Positives - 113/162 (69%) 

Entry S54583 from database PIR: 

ubiquitin-like protein DSK2 - yeast (Saccharomyces cerevisiae) 
Length - 373 

Score = 278 (97.9 bits), Expect - 1.2e-23, P - 1.2e-23 
Identities = 100/307 (32%), Positives = 155/307 (50%) 

Entry AB015344_1 from database TREMBLNEW: 

gene: "HRIHFB2157"; Homo sapiens HRIHFB2157 mRNA, partial cds . 
Score = 1135, P «= 3.6e-115, identities = 227/301, positives = 253/301 



Alert BLASTP hits for DKF2phfbr2_312 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_312, frame 3 



Report for DKFZphfbr2_312 . 3 



589 

62489.22 
5.02 

TREMBL :AB01 534 4_1 gene: "HRIHFB2157"; Homo sapiens HRIHFB2157 mRNA, partial 
03.22 cell cycle control and mitosis [S. cerevisiae, YMR276w] 2e-17 
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[ FUNCAT] 


30.10 nuclear organization [S. cerevisiae 


[BLOCKS] 


BL00299 Ubiquitin 


family proteins 


[SUPFAM] 


unassigned ubiquitin- 


related proteins 5e-16 


[ SUPFAMJ 


ubiquitin homology 


5e 


-16 


[PROSITE] 


MYRISTYL 24 






[PROSITE] 


CK2 PHOSPHO SITE 




9 


[PROSITE] 


GLYCOSAMINOGLYCAN 




1 


[PROSITE] 


PKC_PHOSPHO SITE 




3 


t PROSITE] 


ASN__GLYCOSYLATION 




7 


[PFAMJ 


Ubiquitin family 






[KW] 


Irregular 






[KW] 


3D 






[KW] 


LOW_COMPLEXITY 


23. 


43 % 



SEQ MAESGESGGPPGSQDSAAGAEGAGAPAAAASAEPKIMKVTVKTPKEKEEFAVPENSSVQQ 

SEG . . xxxxxxxxxxx . . xxxxxxxxxxxxxxxxxxx . . . xxxxxxxxxxxx 

laarA CEEEEEETTTCEEEECTTTTBHHH 

SEQ FKEEISKRFKSHTDQLVLI FAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRPQDHSAQQT 

SEG 

laarA HHHHHHHHHCCCGGGEEEEETTEECTTTTBGGGGCCTTTTEEEEEBC 

SEQ NTAGGNVTTSSTPNSNSTSGSATSNPFGLGGLGGLAGLSSLGLNTTNFSELQSQMQRQLL 

SEG . . .xxxxxxxxxxxxxxxxxxxxxx. .xxxxxxxxxxxxxxxx 

laarA 

SEQ SNPEMMVQIMENPFVQSMLSNPDLMRQLIMANPQMQQLIQRNPEI SHMLNNPDIMRQTLE 

SEG 

laarA 

SEQ LARNPAMMQEMMRNQDRALSNLESI PGGYNALRRMYTDIQEPMLSAAQEQFGGNPFASLV 

SEG 

laarA 

SEQ SNTSSGEGSQPSRTENRDPLPNPWAPQTSQSSSASSGTASTVGGTTGSTASGTSGQSTTA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

laarA 

SEQ PN L V PG VGA SM FNT PGMQS LLQQ I T EN PQLMQNML SAP YM RS MMQS L S QN P DL AAQMMLN 

SEG 

laarA 

SEQ NPLFAGNPQLQEQMRQQLPTFLQQMQNPDTLSAMSNPRAMQALLQIQQGLQTLATEAPGL 

SEG 

laarA 

SEQ I PGFTPGLGALGSTGGSSGTNGSNATPSENTSPTAGTTEPGHQQFIQQMLQALAGVNPQL 

SEG . . . .xxxxxxxxxxxxxxxxxxxxxxxx 

laarA 

SEQ QNPEVRFQQQLEQLSAMGFLNREANLQALIATGGDINAAIERLLGSQPS 

SEG 

laarA 



Prosite for DKF2phfbr2_312 . 3 



PS00001 55->59 ASN__GLYCOSYLATION PDOC00001 

PS00001 126->130 ASN_GLYCOSYLATION PDOC000O1 

PS00001 136->140 ASN_GLYCOSYLATION PDOC00001 

PS00001 164->168 ASN_GLYCOSYLATION PDOC00001 

PS00001 167->171 ASN_GLYCOSYLATION PDOC00001 

PS00001 302->306 ASNJ3LYCOSYLATION PDOC00001 

PS00001 501->505 ASN_GLYCOSYLATION PDOC00001 

PS00002 305->309 GLYCOSAMINOGLYCAN PDOC00002 

PS00005 40->43 PKC_PHOSPHO SITE PDOC00005 

PS00005 43->46 PKC_PHOSPHO"siTE PDOC00005 

PS00005 66->69 PKC_PHOSPHO_SITE PDOC00005 

PS00006 43->47 CK2_PHOSPHO_SITE PDOC00006 

PS00006 71->75 CK2_PHOSPHO_SITE PDOC00006 

PS00006 181->185 CK2_PHOSPHO_SITE PDOC00006 

PS00006 200->204 CK2_PHOSPHO_SITE PDOC00006 

PS00006 260->264 CK2_PHOSPHO_SITE PDOC00006 

PS00006 304->308 CK2_PHOSPHO_SITE PDOC00006 

PS00006 312->316 CK2_PHOSPHO SITE PDOC00006 

PS00006 506->510 CK2_PHOSPHO"siTE PDOC00006 

PS00006 572->576 CK2_PHOSPHO SITE PDOC00006 

PS00008 8->14 MYRISTYL ~ PDOC00008 

PS0OOO8 12->18 MYRISTYL PDOC00008 
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PS00008 


19 


>->25 


myrt<;tvt 


PDOC00008 


r w w v\j w v 


24 


i->30 


MYRT QTVT 


PDOC00008 


PS00008 


95- 


->101 


MYRT^TYI. 


PDOC00008 


PS00008 


124- 


>130 


MYRISTYL. 


PDOC00008 


PS00008 


140- 


>146 


MYRISTYL 


PDOC00008 


PS00008 


150- 


>156 


MYRISTYL 


PDOC00008 


PS00008 


153- 


>159 


MYRISTYL 


PDOC00008 


PS00008 


162- 


>168 


MYRISTYL 


PDOC00008 


PS00008 


267- 


>273 


MYRISTYL 


PDOC00008 


PS00008 


293- 


■>299 


MYRISTYL 


PDOC00008 


PS00008 


308- 


>314 


MYRISTYL 


PDOC00008 


PS00008 


337- 


>343 


MYRISTYL 


PDOC00008 


PS00008 


343- 


>349 


MYRISTYL 


PDOC00008 


PS00008 


347- 


>353 


MYRISTYL 


PDOC00008 


PS00008 


355- 


>361 


MYRISTYL 


PDOC00008 


PS00008 


366- 


>372 


MYRISTYL 


PDOC00008 


PS00008 


479- 


>485 


MYRISTYL 


PDOC00008 


PS00008 


489- 


>495 


MYRISTYL 


PDOC00008 


PS00008 


492- 


>498 


MYRISTYL 


PDOC00008 


PS00008 


495- 


>501 


MYRISTYL 


PDOC00008 


PS00008 


499- 


>505 


MYRISTYL 


PDOC00008 


PS00008 


573- 


>579 


MYRISTYL 


PDOC00008 



Pfam for DKF2phfbr2_312.3 



HMM_NAME Ubiquitin family 

HMM *MQIFVKTLtGRTcTFEVepQEtVeqIKQHIeekEGIPPeQQRLIFaGRQ 

M ++VKT + +F V+++ V Q+K+ 1+ +Q +LIFAG+ 

Query 37 MKVTVKTPK-EKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKI 

HMM LEDeKTLsDYNIggeSTLHLVlR* 

L D TLS+++I + T+HLV++ 
Query 85 LKDQDTLSQHGIHDGLTVHLVIK 107 
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DKFZphfbr2_62bll 



group: signal transduction 

DKFZphfbr2_62bll .encodes a novel 655 amino acid putative GTPase-activating protein, related to 
human chimaerins . 

The rac small GTPase is associated with type-I phosphatidylinositol 4-phosphate 5-kinase and 
regulating the production of phosphatidylinositol 4, 5-bisphosphate . The new protein is 
expected to activate p21rac-related small GTPases. 

The new protein can find clinical application in modulating/blocking the response to a 
cellular receptor. 



similarity to CHIMAERIN 

complete cDNA, complete cds, EST hits 

Sequenced by LMU 

Locus: /map^'M" 

Insert length: 4593 bp 

Poly A stretch at pos. 4571, polyadenylation signal at pos. 4553 



1 GGGGGAGTTT GAAGACAGAA AGGAAAGGGG AGAAACCTGC AGAGAGCATC 
51 AAAGGATGGG GGGTGCTATA AAAGAAGCAG GGGGGTCCTT TGAAAGAAAT 
101 CTATCATGCA CTGAAATGCT TTCTGGAGAA GGTGCCGTTA TTTTCCTCCC 
151 CTCTTGCTCA GATGAAAGGA GCCAGCAAGG ACAGTCCTGA AATATTCCTC 
201 AGGGGACTTT TTGTCATTGT TCCTCTTTCC TCTTGCACAG AGCTATTTGC 
251 TGACCTTTCC AGAGGAATCT CAGTCCAGCT GAGAAGACAG TTCTTAATAA 
301 AAACAAAAAA ATGCAAAAAC CAATTCCTGC TGTTTGAATG GGAATGGTAG 
351 CTTGCTTGCT GCAGTTCTTT TCCTGTGACA TTTTGGAATG TCTGCAGAAA 
401 CTTAAAAAAA AGAAAAAAAA AACCTTAAAA ACTCCCTGGA TTAGGCAAGA 
451 GAAAAGGAAG TTTTTTTTTG CTAAACAGGA GTAAATGAGA GGTGGTAACT 
501 TATCCCTAAG CCAGGACCTG GATGATCAAA ACCTTCAAAT TCTAGGGATC 
551 AGCACTTCAA AAATAACAAG TAAACAAGCA TGAGGAGTGG CTGTTGGGTT 
601 TCGCTCAGAG GCAGGTTTTA AAGGAAGCCA AAACCGGGTT CAGAACTTCA 
651 GGCCTGTACG ATGCCTGAAG ACCGGAATTC TGGGGGGTGC CCGGCTGGTG 
701 CCTTAGCCTC AACTCCTTTC ATCCCTAAAA CTACATACAG AAGAATCAAA 
751 CGGTGTTTTA GTTTTCGGAA AGGCATTTTT GGACAGAAAC TGGAGGATAC 
801 TGTTCGTTAT GAGAAGAGAT ATGGGAACCG TCTGGCTCCG ATGTTGGTGG 
851 AGCAGTGCGT GGACTTTATC CGACAAAGGG GGCTGAAAGA AGAGGGTCTC 
901 TTTCGACTGC CAGGCCAGGC TAATCTTGTT AAGGAGCTCC AAGATGCCTT 
951 TGACTGTGGG GAGAAGCCAT CATTTGACAG CAACACAGAT GTACACACGG 
1001 TGGCATCACT TCTTAAGCTG TACCTCCGAG AACTTCCAGA ACCAGTTATT 
1051 CCTTATGCGA AGTATGAAGA TTTTTTGTCA TGTGCCAAAC TGCTCAGCAA 
1101 GGAAGAGGAA GCAGGTGTTA AGGAATTAGC AAAGCAGGTG AAGAGTTTGC 
1151 CAGTGGTAAA TTACAACCTC CTCAAGTATA TTTGCAGATT CTTGGATGAA 
1201 GTACAGTCCT ACTCGGGAGT TAACAAAATG AGTGTGCAGA ACTTGGCAAC 
1251 GGTCTTTGGT CCTAATATCC TGCGCCCCAA AGTGGAAGAT CCTTTGACTA 
1301 TCATGGAGGG CACTGTGGTG GTCCAGCAGT TGATGTCAGT GATGATTAGC 
1351 AAACATGATT GCCTCTTTCC CAAAGATGCA GAACTACAAA GCAAGCCCCA 
1401 AGATGGAGTG AGCAACAACA ATGAAATTCA GAAGAAAGCC ACCATGGGGC 
1451 TGTTACAGAA CAAGGAGAAC AATAACACCA AGGACAGCCC TAGTAGGCAG 
1501 TGCTCCTGGG ACAAGTCTGA GTCACCCCAG AGAAGCAGCA TGAACAATGG 
1551 ATCCCCCACA GCTCTATCAG GCAGCAAAAC CAACAGCCCA AAGAACAGTG 
1601 TTCACAAGCT AGATGTGTCT AGAAGCCCCC CTCTCATGGT CAAAAAGAAC 
1651 CCAGCCTTTA ATAAGGGTAG TGGGATAGTT ACCAATGGGT CCTTCAGCAG 
1701 CAGTAATGCA GAAGGTCTTG AGAAAACCCA AACCACCCCC AATGGGAGCC 
1751 TACAGGCCAG AAGGAGCTCT TCACTGAAGG TATCTGGTAC CAAAATGGGC 
1801 ACGCACAGTG TACAGAATGG AACGGTGCGC ATGGGCATTT TGAACAGCGA 
1851 CACACTCGGG AACCCCACAA ATGTTCGAAA CATGAGCTGG CTGCCAAATG 
1901 GCTATGTGAC CCTGAGGGAT AACAAGCAGA AAGAACAAGC TGGAGAGTTA 
1951 GGCCAGCACA ACAGACTGTC CACCTATGAT AATGTCCATC AACAGTTCTC 
2001 CATGATGAAC CTTGATGACA AGCAGAGCAT TGACAGTGCT ACCTGGTCCA 
2051 CTTCCTCCTG TGAAATCTCC CTCCCTGAGA ACTCCAACTC CTGTCGCTCT 
2101 TCTACCACCA CCTGCCCAGA GCAAGACTTT TTTGGGGGGA ACTTTGAGGA 
2151 CCCTGTTTTG GATGGGCCCC CGCAGGACGA CCTTTCCCAC CCCAGGGACT 
2201 ATGAAAGCAA AAGTGACCAC AGGAGTGTGG GAGGTCGAAG TAGTCGTGCC 
2251 ACCAGTAGCA GTGACAACAG TGAGACATTT GTGGGCAACA GCAGCAGCAA 
2301 CCACAGTGCA CTGCACAGTT TAGTTTCCAG CCTGAAACAG GAAATGACCA 
2351 AACAGAAGAT AGAGTATGAG TCCAGGATAA AGAGCTTAGA ACAGCGAAAC 
2401 TTGACTTTGG AAACAGAAAT GATGAGCCTC CATGATGAAC TGGATCAGGA 
2451 GAGGAAAAAG TTCACAATGA TAGAAATAAA AATGCGAAAT GCCGAGCGAG 
2501 CAAAAGAAGA TGCCGAGAAA AGAAATGACA TGCTACAGAA AGAAATGGAG 
2551 CAGTTTTTTT CCACGTTTGG AGAACTGACA GTGGAACCCA GGAGAACCGA 
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2601 GAGAGGAAAC ACAATATGGA 

2651 TGATGGCTCT GGCAAGGACT 

2701 CCAGGTGGCT GGTCACCTGG 

2751 TCATTTACAG ACATTAAACA 

2801 TCATGCCCCA TAATGCTACT 

2851 AGAGTAGTTT TTCAAAAGTA 

2901 TATTTTATTG CAAGTCTTGT 

2951 ATTTAGCTTG CTTTCAAGCT 

3001 TGGCATTGTG TTATCATCGG 

3051 CTTTTTTGCT GAGGAAATGA 

3101 ATATATGAGT TATTAAAACC 

3151 GCCAATAGAC TTTGTCATGA 

3201 AATACAGTCG AATCACCAGG 

3251 CTGGCACCAC TCAGTTTTGC 

3301 GAGACTCCAT GAGAAAGTCC 

3351 ATCCTCAGTG CGTATCGCCA 

3401 GGTAAAGGAT GGCATTTAAC 

3451 GGACCGAATC TCTTTAACTG 

3501 CTTAGGTACT GGGAAACAAT 

3551 GTCTCCCACT CAAACCTCTC 

3601 CAGCAGTGAA ATGGTATTAC 

3651 TATGTCACGT AGTGACATTT 

3701 TTCTCTCCCT ACTACAGCTG 

3751 GGAGGGGGCC AGGCTGCAGG 

3801 ATTTTGCTTC TAATTTTGAC 

3851 ATTTTAAAAG GTGAATGCCT 

3901 TCAGCAATGC TAATTTTCTA 

3951 TTTGGGAGAA CAGTTCTTCA 

4001 AAATAATCTT TCTCACCGTA 

4051 AATAGAGACT ACATACTTGA 

4101 AATGCTTGCA TGTGTTTATT 

4151 CTTTCATCCT TGCCACTGTC 

4201 AAGCGTCTGT GGTCCTATGG 

4251 CTTTAATTCC CTTTTCTCTC 

4301 CCTGGACATA CGATAGGAAA 

4351 ATAACTCTCC CTTCATATCT 

4401 GATAAAAACC TCAGACTCAT 

4451 AAACAAAGAT ATTTAAACTG 

4501 CTGTTACTGA ACCTTCTATG 

4551 AATAATAAAA ATACTACTCC 



TTCAGTGAGC CTGCTTTCGC CTGCTGTCTC 
CCAGGGATTC TGGTGGGATA TGACTTAGAA 
ATGTACAGAA GTCTAACTGG TGAAGGAATA 
TCCATATCTG CAATGTGTAC CAAAGTTATA 
GTCAAGTGTT ACAACTGGAT ATGTGTATAT 
AACTAAAAAT GAGAAGCATA TTTCAAGAAT 
ATTTAAATGT TAAATCAATA TGTTGTTGCA 
TCACCCCTTG CACTTAACAT AAGCTATTTT 
CTTATTTTAT AGATCAATAT TTTTATTTCC 
AGATAAGCAA AAATATAAAT ATATATATAA 
AGAAGAATAC TTTGTGGCTG TGCTGTTTGT 
CCAAAAAGAG AAATGTAAAT AGTTTTATAA 
AACCTTTGAG CTGCTTTTAA AATTCTTCCC 
TTTTGCGAGG CGATTTGACA TAGGAACTTT 
CTTTCTGAGG CCCACTGTCT ACCTTGCCAG 
ATGCAGGATG CTCCTTAGAA AAGAAAAAAT 
GATTCAGGCT TTGAATTACT CTGTCCCTCT 
CTGGATAGTT TTAGAGGAAT TCTCCTGCTA 
GCTTGCTAAA CCATGCCCAC GTGAGCACCT 
CCATCTCCCA ACAACTGCAC TTTAGAATAC 
TGTTTCCCTC TGAGTGAAAC TGCTAGAGTA 
TTTTCTCACT CAGGCTATTG CCATCTGGGA 
GCAAAGTTGG TTTGCAGCAA GAAGATAGTG 
AGAAGGAGAA AAGTTTAGAA GAAACAAACC 
AGTATCACTT TCCTGTTAAA ACATACAATA 
AAAGTTCCAA TTTTAGCAAA TATGGGAACC 
GAAAAACCCA GGGCTCTTTG GAGCTAGAGT 
CAATAAGGCA ATGGTTTTGA GAGGCCAGGC 
GAACAAAAAG TTACAAAAGG CATAATCGGA 
GTTTATGGGG TTTGTGTTGT TTGAAGGTTC 
TATTTTCAAG AGGGAAAGTG GTCTGTACTG 
TTGCTTTTAT TTTTTACTCT CCCACTGAGC 
TATCAACCAG TATCTTTATA GCAATAATTT 
TCTTTCCAAT TATTTAACCA GTTACTTCCA 
TTCAAACTCA AAATATGAAA ATTGATCTTA 
TTTCACCTAT TTCCAGTCCT TATCATAGTT 
CCAGAAAGCT ATATGATGCA CTAGTAAAAA 
CTTGGGTTCA AATGGTATAC AATTTGCCAG 
CATAACTTTT TTTTTCCTCT GTGCAATTGG 
CATAAAAAAA AAAAAAAAAA AAC 



BLAST Results 



Entry G38474 from database EMBLNEW: 

SHGC-58303 Human Homo sapiens STS genomic, sequence tagged site. 
Score « 2175, P = 1.2e-92, identities = 439/441 



Medline entries 



97476250: 

Beta2-chimaerin is a high affinity receptor for the phorbol ester tumor 
promoters . 



Peptide information for frame 1 



ORF from 661 bp to 2 625 bp; peptide length: 655 
Category: similarity to known protein 



1 MPEDRNSGGC PAGALASTPF IPKTTYRRIK RCFSFRKGIF GQKLEDTVRY 

51 EKRYGNRLAP MLVEQCVDFI RQRGLKEEGL FRLPGQANLV KELQDAFDCG 

101 EKPSFDSNTD VHTVASLLKL YLRELPEPVI PYAKYEDFLS CAKLLSKEEE 

151 AGVKELAKQV KSLPVVNYNL LKYICRFLDE VQSYSGVNKM SVQNLATVFG 

201 PNILRPKVED PLTIMEGTVV VQQLMSVMIS KHDCLFPKDA ELQSKPQDGV 

251 SNNNEIQKKA TMGLLQNKEN NNTKDSPSRQ CSWDKSESPQ RSSMNNGSPT 

301 ALSGSKTNSP KNSVHKLDVS RSPPLMVKKN PAFNKGSGIV TNGSFSSSNA 

351 EGLEKTQTTP NGSLQARRSS SLKVSGTKMG THSVQNGTVR MGILNSDTLG 

401 NPTNVRNMSW LPNGYVTLRD NKQKEQAGEL GQHNRLSTYD NVHQQFSMMN 

451 LDDKQSIOSA TWSTSSCEIS LPENSNSCRS STTTCPEQDF FGGNFEDPVL 

501 DGPPQDDLSH PRDYESKSDH RSVGGRSSRA TSSSDNSETF VGNSSSNHSA 

551 LHSLVSSLKQ EMTKQKIEYE SRIKSLEQRN LTLETEMMSL HDELDQERKK 
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601 FTMIEIKMRN AERAKEDAEK RNDMLQKEME QFFSTFGELT VEPRRTERGN 
651 TIWIQ 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_62bll, frame 1 

SWISSPROT:Y053 HUMAN HYPOTHETICAL PROTEIN KIAA0053., N = 3, Score =» 
661, P = 2.4e-89 

TREMBL:HSU90908_1 product: "unknown"; Human clones 23549 and 23762 
mRNA, complete cds., N = 1, Score = 348, P ■ l.le-29 

PIR:S29128 N-chimerin - rat, N - 1, Score * 286, P = 2.8e-24 

PIR:S29956 beta-chiraerin - rat, N = 1, Score - 279, P - 1.6e-23 

TREMBL:AB014 572_1 gene: "KIAA0672"; product: ,, KIAA0672 protein"; Homo 
sapiens mRNA for KIAA0672 protein, complete cds., N - 1, Score = 314, P 
- le-24 



>SWISSPROT:Y053_HUMAN HYPOTHETICAL PROTEIN KIAA0053 . 
Length => 638 

HSPs: 



Score 


* 661 


(99.2 bits), Expect =■ 2.4e-89, Sum P(3) = 2.4e-89 




Identities * 


» 122/209 (58%), Positives = 160/209 (76%) 




Query: 


38 


GIFGQKLEDTVRYEKRYGNRLAPMLVEQCVDFIRQRGLKEEGLFRLPGQANLVKELQDAF 


97 






G+FGQ+L++TV YE+++G L P+LVE+C +FI + G EEG+FRLPGQ NLVK+L+DAF 




Sbjct: 


148 


GVFGQRLDETVAYEQKFGPHLVPILVEKCAEFILEHGRNEEGIFRLPGQDNLVKQLRDAF 


207 


Query : 


98 


DCGEKPSFDSNTDVHTVASLLKLYLRELPEPVIPYAKYEDFLSCAKLLSKEEEAGVKELA 


157 






D GE+PSFD +TDVHTVASLLKLYLR+LPEPV+P+++YE FL C +L + +E +EL 




Sbjct: 


208 


DAGERPSFDRDTDVHTVASLLKLYLRDLPEPVVPWSQYEGFLLCGQLTNADEAKAQQELM 


267 


Query: 


158 


KQVKSLPVVNYNLLKYICRFLDEVQSYSGVNKMSVQNLATVFGPNILRPKVEDPLTIMEG 


217 






KQ+ LP NY+LL YICRFL E+Q VNKMSV NLATV G N++R KVEDP IM G 




Sbjct : 


268 


KQLSILPRDNYSLLSYICRFLHEIQLNCAVNKMSVDNLATVIGVNLIRSKVEDPAVIMRG 


327 


Query: 


218 


TVVVQQLMSVMISKHDCLFPKDAELQSKP 24 6 








T +Q++M++MI H+ LFPK ++ P 




Sbjct: 


328 


TPQIQRVMTMMIRDHEVLFPKSKDIPLSP 356 




Score 


= 210 


(31.5 bits), Expect = 2.4e-89, Sum P(3) = 2.4e-89 




Identities ■ 


= 45/115 (39%), Positives = 73/115 (63%) 




Query: 


531 


TSSSDNSETFVGNSSSNHSALHSL VSSLKQEMTKQKIEYESRIKSLEQRNLTLETEM 


587 






T +S NSET G +S + SL V L++E+ QK YE +IK+LE+ N + ++ 




Sbjct: 


523 


TLASPNSETGPGKKNSGEEEIDSLQRMVQELRKEIETQKQMYEEQIKNLEKENYDVWAKV 


582 


Query: 


588 


MSLHDELDQERKKFTMIEIKMRNAERAKEDAEKRNDMLQKEMEQFFSTFGELTVE 642 








+ L++EL++E+KK +EI +RN ER++ED EKRN L++E+++F + E E 




Sbjct: 


583 


VRLNEELEKEKKKSAALEISLRNMERSREDVEKRNKALEEEVKEFVKSMKEPKTE 637 




Score 


- 70 


(10.5 bits), Expect * 1.2e-74, Sum P(3) = 1.2e-74 




Identities « 28/121 (23%), Positives = 54/121 (44%) 




Query: 


528 


SRATSSSDNSETFVGNSSSNHSALHSLVSSLKQE-MTKQKI EYESRI KSLEQRNL-TLET 


585 






S+ TS+ DN + G+ SAL S K + + E K+ + + +L+ 




Sbjct: 


489 


SQRTSTYDNVPSLPGSPGEEASALSSQACDSKGDTLASPNSETGPGKKNSGEEEIDSLQR 


548 


Query : 


586 


EMMSLHDELDQERKKFTMIEIKMRNAERAKEDAEKRNDMLQKEMEQFFSTFGELTVEPRR 


645 






+ L E++ +++ M E +++N E+ D + L +E+E+ L + R 




Sbjct: 


549 


MVQELRKEIETQKQ MYEEQIKNLEKENYDVWAKVVRLNEELEKEKKKSAALEISLRN 


605 



Query: 646 TER 64 8 
ER 

Sbjct: 606 MER 608 

Score = 53 (8.0 bits), Expect » 2.4e-89, Sum P(3) = 2.4e-89 
Identities » 31/111 (27%), Positives - 46/111 (41%) 



Query: 344 SFSSSNAEGLEKTQTTPNGSLQARRSSSLKVSGTKMGTHSVQNG TV— RMGILNSD 397 

SFSS +++TT A SKV KG +Q+ T+ R L S 

Sbjct: 388 SFSSMTSDS-DTTSPTGQQPSDAFPEDSSKVPREKPGDWKMQSRKRTQTLPNRKCFLTSA 446 



260 



WO 01/12659 



PCT7IB00/01496 



Query: 398 TLG-NPTNV RNMSWLPNGYVTLRDNKQKEQAGELGQ HNRLSTYDNV 442 

G N + + +N W P+ + ++ + +L Q R STYDNV 

Sbjct: 447 FQGANSSKMEIFKNEFWSPSSEAKAGEGHRRTMSQDLRQLSDSQRTSTYDNV 4 98 

Score - 53 (8.0 bits), Expect - 3.5e-14, Sum P(3) ~ 3.5e-14 
Identities = 32/125 (25%), Positives = 56/125 (44%) 

Query: 242 LQSKPQDG VSNNNEIQKKATMGLLQNKEN — NNTKD SPSRQCSWDKSESPQRSS 293 

++SK +D + +IQ+ TM ++++ E +KD SP Q + K RSS 

Sbjct: 314 IRSKVEDPAVIMRGTPQIQRVMTM-MIRDHEVLFPKSKDIPLSPPAQKNDPKKAPVARSS 372 

Query: 294 MNNGSPTALSGSKTNSPKNSVHKLDVSRSPPLMVKKNPAFNKGSGIVTNGSFSSSNAEGL 353 

+ + L S+T+S + D+P++AF + SV + 

Sbjct: 373 VGWDATEDLRISRTDSFSSMTSDSDTTS — PTGQQPSDAFPEDSSKVPREKPGDWKMQSR 430 

Query: 354 EKTQTTPN 361 

++TQT PN 
Sbjct: 431 KRTQTLPN 438 

Pedant information for DKFZphfbr2_62bll, frame 1 



Report for DKFZphfbr2_62bll . 1 

[LENGTH] 655 

[MW] 73394.60 

tpl] 8.13 

(HOMOLJ SWISSPROT:Y053_HUMAN HYPOTHETICAL PROTEIN KIAA0053 . 3e-71 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YPLllSc] le-16 

[FUNCAT] 09.04 biogenesis of cytoskeleton [S. cerevisiae, YPLllSc] le-16 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YPLllSc] 
le-16 

[FUNCAT] 10.02.09 regulation of g-protein activity [S. cerevisiae, YPLllSc] le-16 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YER155c| 2e-16 

[FUNCAT] 30.03 organization of cytoplasm (S. cerevisiae, YER155c) 2e-16 

[FUNCAT] 10.99 other signal-transduction activities (S. cerevisiae, YDR379w] 4e-16 

[FUNCAT] 03.10 sporulation and germination (S. cerevisiae, YDL240w) 3e-15 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YOR134w) 2e-13 

[FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YOR134w] 2e-l3 

[SCOP] dlrgp 1.83.1.1.1 p50 RhoGAP domain [human (Homo sapiens) 2e-4 6 

[SCOP] dlpbwa_ 1.83.1.1.2 p85 alpha subunit RhoGAP domain [human (Horn 6e-37 

[PIRKW] phosphotransferase 3e-13 

[PIRKW) breakpoint cluster region 2e-20 

[PIRKW] transmembrane protein 7e-14 

[PIRKW] brain 2e-20 

[PIRKW] alternative splicing 2e-20 

[PIRKW] P-loop 9e-19 

[PIRKW] cytoskeleton le-08 

[SUPFAM] CDC 2 4 homology 7e-21 

[SUPFAM] bcr protein 7e-21 

[SUPFAM] myosin motor domain homology 9e-19 

[SUPFAM] pleckstrin repeat homology 2e-15 

[SUPFAM] LIM metal-binding repeat homology 9e-15 

[SUPFAM] protein kinase C zinc-binding repeat homology 5e-24 

[PROSITE] MYRISTYL 16 

[PROSITE] CAMP_PHOSPHO_SITE 3 

[PROSITE] CK2 PHOSPHO_SITE 15 

[PROSITE] TYR~PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 11 

[PROSITE] AS N_GL YCOS Y L AT I ON 8 

[KW] Irregular 

[KW] 3D 

[KW] LOW_COMPLEXITY 6.87 % 

[KW] COILED_COIL 12.06 % 

SEQ MPEDRNSGGCPAGALASTPFIPKTTYRRIKRCFSFRKGIFGQKLEDTVRYEKRYGNRLAP 

SEG 

COILS 

Irgp- C 

SEQ MLVEQCVDFIRQRGLKEEGLFRLPGQANLVKELQDAFDCGEKPSFDSNTDVHTVASLLKL 

SEG 

COILS 

Irgp- HHHHHHHHHHHHHHTTTTTTTTTCCCHHHHHHHHHHHHHCCCCCGGGCCCCHHHHHHHHH 

SEQ YLRELPEPVIPYAKYEDFLSCAKLLSKEEEAGVKELAKQVKSLPWNYNLLKYICRFLDE 
SEG 
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COILS 

Irgp- HHHHTTTTTTTGGGHHHHHH TTTTCGGGHHHHHHHHHHHCCHHHHHHHHHHHHHHHH 

SEQ VQSYSGVNKMSVQNLATVFGPNILRPKVEDPLTIMEGTVVVQQLMSVMISKHDCLFPKDA 

SEG 

COILS 

1 rgp- HHHHHHHHCCCHHHHHHHHGGGCC 

SEQ ELQSKPQDGVSNNNEIQKKATMGLLQNKENNNTKDSPSRQCSWDKSESPQRSSMNNGSPT 

SEG 

COILS 

Irgp- 

SEQ ALSGSKTNSPKNSVHKLDVSRSPPLMVKKNPAFNKGSGIVTNGSFSSSNAEGLEKTQTTP 

SEG 

COILS 

Irgp- 

SEQ NGSLQARRSSSLKVSGTKMGTHSVQNGTVRMGILNSDTLGNPTNVRNMSWLPNGYVTLRD 

SEG 

COILS 

irgp- 

SEQ NKQKEQAGELGQHNRLSTYDNVHQQFSMMNLDDKQSIDSATWSTSSCEISLPENSNSCRS 

SEG xxxxxxx 

COILS 

Irgp- 

SEQ STTTCPEQDFFGGNFEDPVLDGPPQDDLSHPRDYESKSDHRSVGGRSSRATSSSDNSETF 

SEG xxxxx xxxxxxxxxxxxxxxxx. . . 

COILS 

Irgp- 

SEQ VGNSSSNHSALHSLVSSLKQEMTKQKIEYESRIKSLEQRNLTLETEMMSLHDELDQERKK 

SEG . . xxxxxxxxxxxxxxxx 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

Irgp- 

SEQ FTMIEIKMRNAERAKEDAEKRNDMLQKEMEQFFSTFGELTVEPRRTERGNTIWIQ 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

irgp- 



Prosite for DKFZphfbr2_62bll . 1 



PS00001 


271- 


■>275 


ASN 


GL YC OS Y L AT I ON 


PDOC00001 


PS00001 


342- 


>346 


ASN~ 


"GLYCOSYLATION 


PDOC00001 


PS00001 


361- 


>365 


asn" 


"glycosylation 


PDOC00001 


PS00001 


386- 


>390 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


407- 


■>411 


asn" 


"glycosylation 


PDOC00001 


PS00001 


543- 


>547 


asn* 


"GLYCOSYLATION 


PDOC00001 


PS00001 


547- 


>551 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


580- 


^584 


asn" 


"glycosylation 


PDOC00001 


PS00004 


258- 


■>262 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


367- 


■>371 


CAMP PHOSPHO~SITE 


PDOC00004 


PS00004 


599- 


•>603 


CAMP"PHOSPHO SITE 


PDOCO0004 


PS00005 


25 


i->28 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


34 


l->37 


PKC"PHOSPHO~SITE 


PDOCO0005 


PS00005 


41 


->50 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


309- 


■>312 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


371- 


■>374 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


388- 


■>391 


PKC"PHOSPHO~SITE 


PDOC00005 


PS00005 


417- 


■>420 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


477- 


>480 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


527- 


■>530 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


557- 


■>560 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


64 6- 


■>649 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00006 


107- 


■>111 


CK2" 


"PHOSPHORS I TE 


PDOC00006 


PS00006 


146- 


■>150 


CK2" 


"PHOSPHO~SITE 


PDOC00006 


psooooe 


213->217 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


230- 


>234 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


348->352 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


417- 


■>421 


CK2 PHOSPHO~SITE 


PDOC00006 


psooooe 


437- 


■>441 


CK2"PHOSPHO SITE 


PDOC00006 


psooooe 


465- 


■>469 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


470- 


■>474 


CK2~PHOSPHO"~SITE 


PDOC00006 


PS00006 


484- 


■>488 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


516- 


■>520 


CK2 - PHOSPHO~SITE 


PDOC00006 


PS00006 


532->536 


CK2_ 


_PHOSPHO_SITE 


PDOC00006 
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589->593 


CK7 PHOSPHO 


SITE 


PDOC00006 


pcnnnnc 




PKP^PHOSPHO 


SITE 


POOC00006 




635->639 


CK2*~PH0SPH0~ 


SITE 


PDOC00006 


row www/ 


43->51 


TYR _ PHOSPHO* 


SITE 


PDOC00007 


pcnnQ07 


176->185 


TYR~PHOSPH0~ 


SITE 


PDOC00007 


penrjnnft 


8->14 


myrYstyl 




PDOC00008 


pcnooflft 


9->15 


MYRISTYL. 




PDOC00008 


pcQQOQR 
t o w w o 


13->19 


MYRISTYL 




PDOC00008 


pcnnnOH 


249->255 


MYRISTYL 




PDOC00008 




263->269 


MYRTSTYL 








297->303 


MYRISTYL 




PDOC00008 


PS00008 


304->310 


MYRISTYL 




PDOC00008 


PS00008 


338->344 


MYRISTYL 




PDOC00008 


PS00008 


343->349 


MYRISTYL 




PDOC00008 


PS00008 


352->358 


MYRISTYL 




PDOC00008 


PS00008 


362->368 


MYRISTYL 




PDOC00008 


PS00008 


376->382 


MYRISTYL 




PDOC00008 


PS00008 


392->398 


MYRISTYL 




PDOC00008 


PS00008 


400->406 


MYRISTYL 




PDOC00008 


PS00008 


524->530 


MYRISTYL 




PDOC00008 


PS00008 


542->548 


MYRISTYL 




PDOC00008 



(Ho Pfam data available for DKFZph£br2_62bll . 1 ) 
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DKFZphfbr2_62flO 



group: intracellular transport and trafficking 

DKFZphfbr2_62f 10 encodes a novel 320 amino acid protein with strong similarity to mammalian 
zinc transporter proteins. 

The novel proteins is a membrane protein, which should be involved in the transport of Zinc 
across the cell membrane. 

The Zn-T-transporters are membrane proteins that facilitates sequestration of zinc in 
endosomal vesicles. In the brain, ZnT-3 mRNA seems to be involved in the accumulation of zinc 
in synaptic vesicles. Zinc (Zn) is an essential element in normal development and metabolism. 
Recent studies show that in Alzheimer's disease, Zn functions as a double-edged sword, 
affording protection against Alzheimer's amyloid beta peptide {the major component of senile 
plaques) at low concentrations and enhancing toxicity at high concentrations by accelerated 
aggregation of the amyloid beta peptide. 

The new protein can find application in modulation of Zinc transport in neuronal cells, thus 
providing means for a modulation of Alzheimer's amyloid beta peptide plaque formation. 



strong similarity to zinc transporter proteins ; 
membrane regions: 5 

Summary DKFZphfbr2_62f 10 encodes a novel 320 amino acid protein with 
similarity to zinc transporter protein. 

The new protein can find clinical application in modulating Zn2+ 
uptake . 



strong similarity to zinc transporter proteins 

complete cDNA, complete cds, few EST hits 

Sequenced by LMU 

Locus : unknown 

Insert length: 5422 bp 

"Poly A stretch at pos. 5397, polyadenylation signal at pos . 5381 



1 GTCTAACTTT GGAAATATCA 

51 TCCCTAAGTA AGGGATGTTA 

101 TGAAGACCCA GCTTTGAGCT 

151 CGTGTTTTCT TGACAACAGC 

201 AACAACAGCC GCAGCTCATC 

251 TATCTTGTGA ATGATAAAGC 

301 AAGGAGCTGC AAATGAACAC 

351 CGGTGAATAA AGATCAGTGT 

401 GGAGGCATGT ACCACTGCCA 

451 GAATGAGTAC GCCTATGCCA 

501 GCTTCATTTT CATGATTGCA 

551 CTTGCTGTTG TCACAGATGC 

601 CCTGCTCAGT CTCTTCTCCC 

651 GGCTGACATT TGGATGGCAC 

701 ATCCTGTGCA TCTGGGTGGT 

751 GCGCCTGCTG TATCCTGATT 

801 TTTCCAGCTG CGCAGTGGCG 

851 CAGAGATGCC TTGGCCACAA 

901 CAGAGCTGCT TTTGTGCATG 

951 TGCTAATTAG TGCACTTATT 

1001 GACCCAATCT GCACATTCAT 

1051 CACTATCTTA AAGGACTTCT 

1101 GCCTGAATTA CAGTGGTGTG 

1151 CTGTCTGTGC ACTGCCTGCA 

1201 TCTCTCAGCT CATGTTGCTA 

1251 GGAGAGAAAT TGCTAAAGCC 

1301 ACCATTCAGA TGGAATCTCC 

1351 TGAAGACCCC TGTGACTAGC 

1401 CAGGCCACCT TCAAACATGC 

14 51 AGGAACCAAA GGAAGAAATT 

1501 TTTATTTAGT TCCATTCACC 

1551 CAATTGGATT ATATACTGAT 

1601 ATATAGATTA TTCCTGAGTG 

1651 GGCAATACCA AATTCATCTC 

1701 GGTAAATTTG AACTCAGGAA 

1751 ATAGTCACAA AATTTTACCA 

1801 AAGTCAGGAA TAAAAGTGAC 



CCCTCATGCT GTCTTCCCAG GATGTCTCTC 
CTTCCTGGAG GGAATGCAGT GTTGGGAATC 
GAATTTGCTT TGTGATACCT GGAGAGAAGA 
ACAGTACCTA GTGAGTTCAA CAACAACGAC 
CTGGCCGTCA TGGAGTTTCT TGAAAGAGCG 
TGCCAAGATG TATGCTTTCA CACTAGAAAG 
TTCATAGCAA TGTGGAACTC CAACAGAAAC 
CCCAGAGAGA GACCAGAGGA GCTGGAGTCA 
CAGTGGCTCC AAGCCCACAG AAAAGGGGGC 
AGTGGAAACT CTGTTCTGCT TCAGCAATAT 
GAGGTCGTGG GTGGGCACAT TGCTGGGAGT 
TGCCCACCTC TTAATTGACC TGACCAGTTT 
TGTGGTTGTC ATCGAAGCCT CCCTCTAAGC 
CGAGCAGAGA TCCTTGGTGC CCTGCTCTCC 
GACTGGCGTG CTAGTGTACC TGGCATGTGA 
ACCAGATCCA GGCGACTGTG ATGATCATCG 
GCCAACATTG TACTAACTGT GGTTTTGCAC 
TCACAAGGAA GTACAAGCCA ATGCCAGCGT 
CCCCTGGAGA TCTATTTCAG AGTATCAGTG 
ATCTACTTTA AGCCAGAGTA TAAAATAGCC 
CTTTTCCATC CTGGTCTTGG CCAGCACCAT 
CCATCTTACT CATGGAAGGT GTGCCAAAGA 
AAAGAGCTTA TTTTAGCAGT CGACGGGGTG 
CATCTGGTCT CTAACAATGA ATCAAGTAAT 
CAGCAGCCAG CCGGGACAGC CAAGTGGTTC 
CTTAGCAAAA GCTTTACGAT GCACTCACTC 
AGTTGACCAG GACCCCGACT GCCTTTTCTG 
TCAGTCACAC CGTCAGTTTC CCAAATTTGA 
TGCTATGCAA TTTCTGCATC ATAGAAAATA 
CATGTCATGG TGCAATGCAT ATTTTATCTA 
ATGAAGGAAG AGGCACTGAG ATCCATCAAT 
CAGTAGCTGT GTTCAATTGC AGGAATGTGT 
GAGCCGAAGT AACAGCTGTT TGTAACTATC 
CCTTCCAATA ATGCATCTTG AGAACACATA 
AGTCTTACTA GAAATCAGTG GAAGGGACAA 
AAACATTAGA AACAAAAAAT AAGGAGAGCC 
TCTGTATGCT AACGCCACAT TAGAACTTGG 
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1851 TTCTCTCACC AAGCTGTAAT GTGATTTTTT TTTCTACTCT GAATTGGAAA 

1901 TATGTATGAA TATACAGAGA AGTGCTTACA ACTAATTTTT ATTTACTTGT 

1951 CACATTTTGG CAATAAATCC CTCTTATTTC TAAATTCTAA CTTGTTTATT 

2001 TCAAAACTTT ATATAATCAC TGTTCAAAAG GAAATATTTT CACCTACCAG 

2051 AGTGCTTAAA CACTGGCACC AGCCAAAGAA TGTGGTTGTA GAGACCCAGA 

2101 AGTCTTCAAG AACAGCCGAC AAAAACATTC GAGTTGACCC CACCAAGTTG 

2151 TTGCCACAGA TAATTTAGAT ATTTACCTGC AAGAAGGAAT AAAGCAGATG 

2201 CAACCAATTC ATTCAGTCCA CGAGCATGAT GTGAGCACTG CTTTGTGCTA 

2251 GACATTGGGC TTAGCACTGA AACTATAAAG AGGAATCAGA CGCAGCAAGT 

2301 GCTTCTGTGT TCTGGTAGCA ACTCAACACT ATCTGTGGAG AGTAAACTGA 

2351 AGATGTGCAG GCCAACATTC TGGAAATCCT ATGTCAGTGG GTTTGGTTTG 

2401 GAACCTGGAC TTCTGCATTT TTAAAAGTTA CCCAGAGATG CTTCTAAAGA 

2451 TGAGCCATAG TCTAGAAGAT TGTCAACCAC AGGAGTTCAT TGAGTGGGAC 

2501 AGCTAGACAC ATACATTGGC AGTTACAATA GTATCATGAA TTGCAATGAT 

2551 GTAGTGGGGT ATAAAAGGAA AGCGATGGAT ATTGCCGGAT GGGCATGGCC 

2601 AGTGATGTTT CACGTCATTG AGGTGACAGC TCTGCTGGAC TTTGAATTAC 

2651 ATATGGAGGC TCTCCAGGAA GACGAAGAAG AGAAGGACAT TCTAGGCAAA 

2701 AAGAAGACTA GGCACAAGGC ACACTTATGT TTGTCTGTTA GCTTTTAGTT 

2751 GAAAAAGCAA AATACATGAT GCAAAGAAAC CTCTCCACGC TGTGATTTTT 

2801 AAAACTACAT ACTTTTTGCA ACTTTATGGT TATGAGTATT GTAGAGAACA 

2851 GGAGATAGGT CTTAGATGAT TTTTATGTTG TTGTCAGACT CTAGCAAGGT 

2901 ACTAGAAACC TAGCAGGCAT TAATAATTGT TGAGGCAATG ACTCTGAGGC 

2951 TATATCTGGG CCTTGTCATT ATT T AT C ATT TATATTTGTA TTTTTTTCTG 

3001 AAATTTGAGG GCCAAGAAAA CATTGACTTT GACTGAGGAG GTCACATCTG 

3051 TGCCATCTCT GCAAATCAAT CAGCACCACT GAAATAACTA CTTAGCATTC 

3101 TGCTGAGCTT TCCCTGCTCA GTAGAGACAA ATATACTCAT CCCCCACCTC 

3151 AGTGAGCTTG TTTAGGCAAC CAGGATTAGA GCTGCTCAGG TTCCCAACGT 

3201 CTCCTGCCAC ATCGGGTTCT CAAAATGGAA AGAATGGTTT ATGCCAAATC 

3251 ACTTTTCCTG TCTGAAGGAC CACTGAATGG TTTTGTTTTT CCATATTTTG 

3301 CATAGGACGC CCTAAAGACT AGGTGACTTG GCAAACACAC AAGTGTTAGT 

3351 ATAATTCTTT GCTTCTGCTT CTTTTTGAAA ATCATGTTTA GATTTGATTT 

3401 TAAGTCAGAA ATTCACTGAA TGTCAGGTAA TCATTATGGA GGGAGATTTG 

3451 TGTGTCAACC AAAGTAATTG TCCCATGGCC CCAGGGTATT TCTGTTGTTT 

3501 CCCTGAAATT CTGCTTTTTT AGTCAGCTAG ATTGAAAACT CTGAACAGTA 

3551 GATGTTTATA TGGCAAAATG CAAGACAATC TATAAGGGAG ATTTTAAGGA 

3601 TTTTGAGATG AAAAAACAGA TGCTACTCAG GGGCTTTATG GACCATCCAT 

3651 CAATTCTGAA GTTCTGACTC TCCCATTACC CTTTCCCTGG TGTGGTCAGA 

3701 ACTCCAGGTC ACTGGAAGTT AGTGGAATCA TGTAGTTGAA TTCTTTACTT 

3751 CAAGACATTG TATTCTCTCC AGCTATCAAA ACATTAATGA TCTTTTATGT 

3801 CTTTTTTTTG TTATTGTTAT ACTTTAAGTT CTGGGGTACA TGTGCGGAAC 

3851 ATGTAGGTTT GTTACATAGG TATACATGTG CCATGGTGGT TTGCTGCACT 

3901 CATCAACCTG TCATCTACAT TCTTTTATGT CTGTCTTTCA AAGCAACACT 

3951 CTGTTCTTCT GAGTAGTGAA ATCAGGTCAA CTTTACCACC AGCCTCCATT 

4001 TTTAATATGC TTCACCATCA TCCAGCACCT ACTTAAGATT TATCTAGGGC 

4051 TCTGTGGTGA TGTTAGGACC CATAAAAGAA ATTTATGCCT TCCATATGTT 

4101 TGGTTACAGA TGGGAAATGG GAATGTTGAA GGACATGAAA GAAAGGATGT 

4151 TTACACATTA AGCATCAGTT CTGAAGCTAG ATTGTCTGAG TTTGAATCTT 

4201 AGCTCTTCCC TTTATTAGCT CTGTGACCTC GAGCTAGTTA CTTAAATGCT 

4251 CTGATCCTCT ATTTCCTGAT CAGTGAAACC TCCCTATTCA AATGTGTGAG 

4301 AGTTTAATAA ATTAGGACAC TTAAAAATGT TGGAGCAGTG CATAGCATGT 

4351 AGTGTTCAGT ACATGTTAAA TGTTGTTTTT TATTATGTAC AAACATGTGT 

4401 GGGCACAGAA TTTTAAATCA TCTCAACTTT TGAGAAATTT TGAGTTATCA 

4451 ACACCGTTCC CACAAGACAG TGGCAAAATT ATTGGTGAGA ATTAAACAGC 

4501 TGTTTCTCAG AGGAAGCAAT GGAGGCTTGC TGGGATAAAG GCATTTACTG 

4551 AGAGGCTGTT ACCTAGTGAG AGTGATGAAT TAATTAAAAT AGTCGAATCC 

4601 CTTTCTGACT GTCTCTGAAA GCTTCCGCTT TTATCTTTGA AGAGCAGAAT 

4 651 TGTCACCCCA AGGACATTTA TTAATAAAAA GAACAACTGT CCAGTGCAAT 

4701 GAAGGCAAAG TCATAGGTCT CCCAAGTCTT ACCCCATTCC TGTGAAATAT 

4751 CAAGTTCTTG GCTTTTCTCT GTCATGTAGC CTCAACTTTC TCCGACCGGG 

4801 TGCATTTCTT TCTCTGGTTT CTAAATTGCC AGTGGCAAAT TTGGATCACT 

4851 TACTTAATAT CTGTTAAATT TTGTGACCCA ACAAAGTCTT TTAGCACTGT 

4901 GGTGTCAAAA AGAAAAACAC CTCCCAGGCA TATACATTTT ATAGATTCCT 

4951 GGAGAATGTT GCTCTCCAGC TCCATCCCCA CCCAATGAAA TATGATCCAG 

5001 AGAGTCTTGC AAAGAGACAA GCCTCATTTT CCACAATTAG CTCTAAAGTG 

5051 CCTCCAGGAA ATGATTTTCT CAGCTCATCT CTCTGTATTC CCTGTTTTGG 

5101 ATCACAGGGC AATCTGTTTA AATGACTAAT TACAGAAATC ATTAAAGGCA 

5151 CCAAGCAAAT GTCATCTCTG AATACACACA TCCCAAGCTT TACAAATCCT 

5201 GCCTGGCTTG ACAGTGATGA GGCCACTTAA CAGTCCAGCG CAGGCGGATG 

5251 TTAAAAAAAA TAAAAAGGTG ACCATCTGCG GTTTAGTTTT TTAACTTTCT 

5301 GATTTCACAC TTAACGTCTG TCATTCTGTT ACTGGGCACC TGTTTAAATT 

5351 CTATTTTAAA ATGTTAATGA GTGTTGTTTA AAATAAAATC AGGAAAGAGA 

5401 GAAAAAAAAA AAAAAAAAAA AC 



BLAST Results 



No BLAST result 



Medline entries 
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97121493: 

ZnT-3, a putative transporter of zinc into synaptic vesicles. 
96203098: 

ZnT-2, a mammalian protein that confers resistance to zinc by 
facilitating vesicular 
sequestration. 



Peptide information for frame 2 



ORF from 407 bp to 1366 bp; peptide length: 320 
Category: strong similarity to known protein 



1 MYHCHSGSKP TEKGANEYAY AKWKLCSASA ICFIFMIAEV VG6HIAGSLA 
51 VVTDAAHLLI DLTSFLLSLF SLWLSSKPPS KRLTFGWHRA EILGALLSIL 
101 CIWWTGVLV YLACERLLYP DYQIQATVMI IVSSCAVAAN IVLTVVLHQR 
151 CLGHNHKEVQ ANASVRAAFV HAPGDLFQSI SVLISALIIY FKPEYKIADP 
201 ICTFIFSILV LASTITILKD FSILLMEGVP KSLNYSGVKE LILAVDGVLS 
251 VHCLHIWSLT MNQVILSAHV ATAASRDSQV VRREIAKALS KSFTMHSLTI 
301 QMESPVDQDP DCLFCEDPCD 

BLAST P hits 

No BLAST P hits available 

Alert BLAST P hits for DKFZphfbr2_62f 10, frame 2 

PIR:S70632 zinc transporter ZnT-2 - rat, N - 1 , Score = 884, P = 
1.5e-88 

TREMBL:MMU76007_1 gene: "ZnT-3"; product: "ZnT-3"; Mus musculus zinc 
transporter ZnT-3 (ZnT-3) mRNA, complete cds., N » 1, Score = 772, P = 
l.le-76 

TREMBL :HSU7 60 10_1 gene: "ZnT-3"; product: "ZnT-3"; Human putative zinc 
transporter ZnT-3 (ZnT-3) mRNA, complete cds., N = 1, Score = 742, P = 
1.6e-73 

TREMBL : MMUZNT02_1 gene: "ZnT-3" ; product: "zinc transporter"; Mus 
musculus zinc transporter (ZnT-3) gene, complete cds., N = 1, Score = 
715, P - 1.2e-70 

TREMBL : CET 1 8 D3_3 gene: "T18D3.3"; Caenorhabditis elegans cosmid T18D3, 
N = 1, Score - 699, P = 5.9e-69 



>PIR:S70632 zinc transporter ZnT-2 - rat 
Length = 359 

HSPs: 



Score = 884 (132.6 bits), Expect » 1.5e-88, P = 1.5e-88 
Identities = 171/326 (52%), Positives = 230/326 (70%) 



Query: 


2 


YHCHSGSKPTEKGANEYAYAKWKLCSASAICFIFMIAEVVGGHI AGS LA VVTDAAHLLI D 


61 




++CH+ +E A+ KL ASAIC +FMI E++GG++A SLA++TDAAHLL D 




Sbjct: 


34 


HYCHAQKDSGSHPNSEKQRARRKLYVASAICLVFMIGEIIGGYLAQSLAIMTDAAHLLTD 


93 


Query: 


62 


LTSFLLSLFSLWLSSKPPSKRLTFGWHRAEILGALLSILCIWVVTGVLVYLACERLLYPD 


121 




S L+SLFSLW+SS+P +K + FGW RAEILGALLS+L IWVVTGVLVYLA +RL+ D 




Sbjct: 


94 


FASMLISLFSLWVSSRPATKTMNFGWQRAEILGALLSVLSIWVVTGVLVYLAVQRLISGD 


153 


Query: 


122 


YQI QATVMI I VSSCAVAAN I VLTWLHQRCLGHNH KEVQANASVRAAFVHAPG 


174 






Y+I+ M+I S CAVA NI++ + LHQ GH+H + Q N SVRAAF+H G 




Sbjct: 


154 


YEIKGDTMLITSGCAVAVNIIMGLALHQSGHGHSHGHSHEDSSQQQQNPSVRAAFIHVVG 


213 


Query: 


175 


DLFQSISVLISALIIYFKPEYKIADPICTFIFSILVLASTITILKDFSILLMEGVPKSLN 


234 




DL QS+ VL++A IIYFKPEYK DPICTF+FSILVL +T+TIL+D ++LMEG PK ++ 




Sbjct: 


214 


DLLQSVGVLVAAYIIYFKPEYKYVDPICTFLFSILVLGTTLTILRDVILVLMEGTPKGVD 273 


Query: 


235 


YSGVKELI LAVDGVLS VHCLH I WSLTMNQV I LSAHV ATAASRDSQV VRRE I AKALS KSFT 


294 






++ VK L+L+VDGV ++H LHIW+LT+ Q +LS H+A A + D+Q V + L F 




Sbjct: 


274 


FTTVKNLLLSVDGVEALHSLHIWALTVAQPVLSVHIAIAQNVDAQAVLKVARDRLQGKFN 


333 
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Query: 295 MHSLTIQMESPVDQDPDCLFCEDPCD 320 

H++TIQ+ES + C C+ P + 
Sbjct: 334 FHTMTIQIESYSEDMKSCQECQGPSE 359 



Pedant information for DKFZphfbr2_62f 10, frame 2 



Report for DKFZphfbr2_62f 10 . 2 



[LENGTH] 

[MW] 

tpl] 

[HOMOL] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT ] 

[FUNCAT] 

2e-16 

[FUNCAT] 

[FUNCAT] 

( FUNCAT] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[ PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



320 

35053.51 
6.48 

PIR:S70632 zinc transporter ZnT-2 - rat 3e-84 

30.02 organization of plasma membrane [S. cerevisiae, YMR243c] 2e-16 

13.01 homeostasis of metal ions (S. cerevisiae, YMR243c] 2e-16 

08.19 cellular import [S. cerevisiae, YMR243c] 2e-16 , 
11.07 detoxificaton [S. cerevisiae, YMR243C] 2e-16 

07.04.01 metal ion transporters (cu, fe, etc.) [S. cerevisiae, YMR243c] 



08.04 mitochondrial transport 
30.16 mitochondrial organization 
99 unclassified proteins [S 
transmembrane protein 2e-30 
mitochondrial inner membrane 6e-12 
mitochondrion 6e-12 
membrane protein le-11 
zinc transporter ZnT-2 2e-30 
membrane protein czcD le-11 
MYRISTYL 4 
CAMP PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
PROKARJLI POPROTEIN 
TYR_PH0SPHO_SITE 
PKC_PHOSPHO_SITE 
ASN_GLYCOSYLATION 2 
TRANSMEMBRANE 5 
LOW COMPLEXITY 8.12 % 



[S. cerevisiae, YOR316c] 3e-l3 
[S. cerevisiae, YOR316c] 3e-13 
cerevisiae, YDR205w] 4e-07 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



MYHCHSGSKPTEKGANEYAYAKWKLCSASAICFI FMIAEVVGGHIAGSLAVVTDAAHLLI 

XXX 

cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhh 
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

DLTSFLLSLFSLWLSSKPPSKRLTFGWHRAEILGALLSILCIWVVTGVLVYLACERLLYP 

xxxxxxxxxxxxxxxxxxxxxxx 

hhhhhhhhhhhhhhhccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhc 
MMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

DYQIQATVMIIVSSCAVAANIVLTVVLHQRCLGHNHKEVQANASVRAAFVHAPGDLFQSI 

cccccccceeeehhhhhhhhhhhhhhhhhcccccccccccccchhhhhhhhhhhhhchhh 

MMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM . . . 

SVLISALIIYFKPEYKIADPICTFIFSILVLASTITILKDFSILLMEGVPKSLNYSGVKE 

hhhhhhhhhhcccceeeccchhhhhhhhhhhhhchhhhhhhheeeeeccccccchhhhhh 
. . MMMMMMMMMMMMMMMMMMMM 

LILAVDGVLSVHCLHIWSLTMNQVILSAHVATAASRDSQWRREIAKALSKSFTMHSLTI 

hhhhhhceeecccceeeeeccchhhhheeeeeccccchhhhhhhhhhhhhhhhcccccee 



QMESPVDQDPDCLFCEDPCD 
eeeccccccccccccccccc 



PS00001 
PS00001 
PS00004 
PS00005 
PS00005 



Prosite for DKFZphfbr2_62f 10 .2 



162->166 
234->238 
81->85 
11->14 
75->78 



ASN_GLYCOSYLATION 
ASN_GLYCOSYLATION 
CAMP_PHOSPHO_SITE 
PKC PHOSPHO SITE 
PKC"PHOSPHO~SITE 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 



267 



WO 01/12659 



PS00005 


80->83 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


164->167 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


304->308 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


13->21 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00008 


7->13 


MYRISTYL 




PDOC00008 


PS00008 


42->48 


MYRISTYL 




PDOC00008 


PS00008 


94->100 


MYRISTYL 




PDOC00008 


PS00008 


228->234 


MYRISTYL 




PDOC00008 


PS00013 


125->136 


PROKAR LIPOPROTEIN 


PDOC00013 



(No Pfam data available for DKFZphfbr2_62£10.2) 
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DKFZphfbr2_62nlO 



group: brain derived 

DKFZphfbr2_62nl0 encodes a novel 541 amino acid protein with similarity to 
PlasmodiumTvivax reticulocyte-binding protein 1. 

The novel protein contains one Leucine Zipper, involved in protein-protein-interaction. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 

similarity to reticulocyte-binding protein 
complete cDNA, complete cds, EST hits 
Sequenced by LMU 
Locus: /map="13 w 
Insert length: 3522 bp 

Poly A stretch at pos. 3503, polyadenylation signal at pos. 3479 

1 GGGGCGTGTT GGCGGGATTC TGAACGCTGC CATGGCTCAG ACCGTGTAGA 
51 ATGTTACATT GTCGCTCACT CTGCCCATCA CGTGCCACAT TTGCTTGGGG 

101 AAGGTACGTC AGCCTGTCAT ATGCATCAAC AACCATGTAT TTTGTTCGAT 

151 TTGTATTGAT TTGTGGTTGA AGAATAATAG CCAGTGTCCA GCTTGCAGAG 

201 TCCCCATCAC TCCTGAAAAT CCTTGCAAAG AAATTATAGG AGGAACAAGT 

251 GAAAGTGAAC CTATGCTAAG CCATACGGTC AGGAAGCATC TTCGGAAAAC 

301 TAGACTTGAA TTACTACACA AAGAATATGA GGACGAAATA GATTGTTTAC 

351 AGAAAGAAGT AGAAGAGCTT AAGAGTAAAA ATCTCAGCTT GGAGTCACAG 

401 ATCAAAGCTA TTCTGGATCC TTTAACCTTG GTGCAGGGCA ACCAAAATGA 

451 AGACAAACAT CTAGTCACAG ATAATCCAAG TATAATTAAC CCAGAAACTG 

501 TAGCAGAGTG GAAGAAAAAA CTCAGAACAG CTAATGAAAT CTATGAAAAA 

551 GTGAAAGATG ATGTGGATAA GCTAAAGGAG GCAAATAAAA AATTGAAATT 

601 GGAAAATGGT GGTCTGGTGA GGGAGAATTT ACGACTGAAG GCTGAAGTTG 

651 ATAACAGATC ACCTCAAAAG TTTGGAAGGT TTGCAGTTGC TGCTCTTCAG 

701 TCCAAAGTAG AACAGTATGA GCGTGAAACC AATCGCCTCA AGAAAGCCCT 

751 GGAACGAAGT GATAAGTATA TAGAGGAACT AGAATCTCAA GTTGCACAGC 

801 TAAAAAATTC AAGTGAAGAG AAAGAGGCTA TGAATTCCAT TTGCCAGACA 

851 GCACTTTCTG CAGATGGCAA AGGGAGCAAA GGCAGTGAGG AGGATGTGGT 

901 GTCAAAGAAT CAAGGCGATA GTGCCAGAAA GCAGCCTGGC TCATCCACCT 

951 CCAGTTCTTC TCACCTAGCG AAGCCTTCCA GCAGCAGACT GTGTGACACC 
1001 AGTTCTGCAA GGCAGGAAAG TACCAGCAAA GCAGACCTTA ACTGTTCTAA 
1051 GAACAAAGAC CTATATCAAG AACAGGTAGA AGTAATGTTA GATGTGACAG 
1101 ATACAAGTAT GGATACTTAT TTGGAAAGAG AATGGGGGAA TAAACCAAGT 
1151 GACTGTGTAC CCTACAAAGA TGAAGAACTT TATGATTTTC CAGCTCCTTG 
1201 TACTCCTTTG TCCCTTAGTT GCCTTCAGCT CAGTACTCCA GAAAATAGAG 
1251 AGAGCTCTGT GGTCCAAGCA GGAGGTTCCA AAAAGCACTC AAACCATCTC 
1301 AGAAAATTGG TGTTTGATGA TTTTTGTGAT TCTTCAAATG TTTCTAATAA 
1351 AGATTCTTCA GAAGATGATA TAAGTAGAAG TGAAAATGAG AAGAAATCAG 
1401 AATGTTTTTC TTCCACAAAG ACAGGATTTT GGGACTGTTG TTCCACAAGC 
14 51 TATGCCCAAA ACTTAGATTT TGAAAGTTCA GAGGGGAACA CGATAGCAAA 
1501 TTCTGTTGGA GAAATATCTT CAAAATTGAG TGAGAAATCA GGCTTATGTT 
1551 TATCCAAAAG GTTGAATTCT ATTCGCTCTT TTGAAATGAA CCGGACAAGA 
1601 ACATCCAGTG AAGCATCGAT GGATGCTGCT TACCTTGACA AAATCTCTGA 
1651 GTTGGATTCA ATGATGTCAG AGTCAGACAA CAGCAAGAGC CCTTGTAATA 
1701 ACGGTTTTAA GTCACTGGAT TTGGATGGGT TATCAAAGTC ATCTCAAGGC 
1751 AGTGAATTTC TTGAGGAACC TGATAAGTTG GAAGAAAAAA CTGAGCTAAA 
1801 CCTTTCCAAA GGTTCTCTAA CTAATGATCA GTTAGAAAAT GGAAGTGAAT 
1851 GGAAACCCAC TTCTTTTTTT TCTCCTCTCT CCATCTGACC AAGAAATGAA 
1901 TGAAGATTTT TCACTCCATT CCAGTTCTTG TCCAGTAACT AATGAAATCA 
1951 AACCCCCAAG CTGCTTGTTT CAGACAGAGT TTTCCCAGGG CATTTTGTTA 
2001 AGCAGTTCAC ATCGACTATT GGAAGATCAA AGATTTGGGT CATCTTTGTT 
2051 TAAGATGTCC TCAGAGATGC ACAGTCTTCA TAACCACCTT CAGTCTCCTT 
2101 GGTCTACTTC CTTTGTGCCT GAAAAGAGGA ATAAAAATGT GAATCAATCA 
2151 ACAAAAAGAA AAATCCAGAG CAGCCTTTCC AGTGCCAGCC CATCAAAAGC 
2201 AACTAAAAGT TGACTCATTA GAAAGGTGTC ATTTGTGGTT TTGTCCTGAG 
2251 AGAAATAGAA AAGTTGTTAA AGTTACCTTT TTTCCTCATA AAAGTTCTAT 
2301 ACAAATTGGA ATTGATAATC TTTAGTCAAG TATCAAGTCA GGATGGTGGA 
2351 TTAACCTGTA CCCAGAATAC TTATTGTTCA TTTTGAAAAG ACTTTGTTCT 
2401 TTTCATTTTT ATTTGGGAGT CTTTGTGACC AGAGAAGTTA GGGAGGAGGT 
2451 TATTTTTGTG TTTTGGGGTT GGTTGGTTGG TTGGTTTTGT TTTTGGTTTT 
2501 GTTTTTTTAC TGAATTTGAT ATGTATCTCG GTTGGATATA CATTGTTTTT 
2551 TTAAAAAATG TTATTTAACT GTTAGATACA GTGGCCTGTT GATAAGCCCC 
2601 ACTTGTCTTC AGAACTTGGA TTTCTTAAAT AAAACTTTTA GTGTTGTCTA 
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2651 TACACTGCTC AATAAGACAC 

2701 TATTTTACCT GTCCCTTTTT 

2751 GCAGGGCCAT GATGGAGAAA 

2801 TACTTTCTTT TTTTAGGTTT 

2851 GTTTTGTCCA GTTTTGTGAA 

2901 TCTATATGAA GACATTTGTT 

2951 GCGTGTGTGT GTATGTGTGT 

3001 AAGGCATGGA GCACTTGGGT 

3051 TGTTCCAGAT GTAACAGGTT 

3101 ACTTGCATTC CAGGGGAGTT 

3151 TGTTCCTGTT CTTTGTGGAA 

3201 GTTTTCCATA CTTAAGAAAA 

3251 TTATGTAGGA CAAAACTTTT 

3301 GTAAGAGGTA AGCACAATTT 

3351 CTAAACGTAT TTGGTATGCC 

3401 CTGTTTAATG TGCACTGAAC 

3451 TAATACTGCA TGCTTTTCTA 

3501 TGGAAAAAAA AAAAAAAAAA 



TTGAGTTTAA GCTTTTCCCA GGGTGGAAAT 
ATTTATGTTT AGTGATGGCC TAGTTTTTCT 
TAGCACTCTA GCCTTAGTCC AATATTGATT 
TATGTATATG TTTGCATTTT TTAGCATTGT 
AATGTTCTGC TAGTATGAAA GAAAACATTT 
TTATGTTAGG TAGCTTACAT TTTCTCCTCT 
AAAATCAGAA ATTTAGCATA CTATGGAAAG 
TTAGAGGAAC CTAAAACATC ATAGCTTCAT 
TGAAAGAGCT CATCGCCAAG TTCTTGATCC 
CTCTTTTGAG TAGTATGTTT CTTGTTTGCA 
ACTATGCATG GTAGCATTTT TGCTTGCTGT 
AGAGGTTTCA GTTGGCTGAT AGAATATCTT 
CTGTGAAGAG TGTTGAGGGG GTGAAGATAG 
TTAATTTAGG CTCTGAAAAA GTGTATTGTT 
TATATAGGTC TTTAAAAATG GGTTTGTATG 
ATTTTACATT AATATTGTAC TGTTTTACAT 
TGTGAATTGA ATAAAGAATG TCATAAGCAC 
AA 



BLAST Results 



Entry HS658254 from database EMBL: 
human STS SHGC-11774. 
Score - 1643, P - 8.0e-67, identities « 345/355 

Entry HS513217 from database EMBL: 
human STS SHGC-14 656. 
Score = 1193, P = 5.8e-46, identities » 241/244 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 263 bp to 1885 bp; peptide length: 541 
Category: similarity to known protein 



1 MLSHTVRKHL RKTRLELLHK 
51 LDPLTLVQGN QNEDKHLVTD 
101 VDKLKEANKK LKLENGGLVR 
151 QYERETNRLK KALERSDKYI 
201 DGKGSKGSEE DVVSKNQGDS 
251 QESTSKADLN CSKNKDLYQE 
301 YKDEELYDFP APCTPLSLSC 
351 FDDFCDSSNV SNKDSSEDDI 
401 LDFESSEGNT IANSVGEISS 
4 51 ASMDAAYLDK ISELDSMMSE 
501 EEPDKLEEKT ELNLSKGSLT 



EYEDEIDCLQ KEVEELKSKN LSLESQIKAI 
NPSIINPETV AEWKKKLRTA NEIYEKVKDD 
ENLRLKAEVD NRSPQKFGRF AVAALQSKVE 
EELESQVAQL KNSSEEKEAM NSICQTALSA 
ARKQPGSSTS SSSHLAKPSS SRLCDTSSAR 
QVEVMLDVTD TSMDTYLERE WGNKPSDCVP 
LQLSTPENRE SSVVQAGGSK KHSNHLRKLV 
SRSENEKKSE CFSSTKTGFW DCCSTSYAQN 
KLSEKSGLCL SKRLNSIRSF EMNRTRTSSE 
SDNSKSPCNN GFKSLDLDGL SKSSQGSEFL 
NDQLENGSEW KPTSFFSPLS I 



BLASTP hits 



Entry A42771 from database PIR: 

reticulocyte-binding protein 1 - Plasmodium vivax 

Score - 127, p - 3.7e-08, identities = 68/300, positives = 145/300 

Entry RBP1_PLAVB from database SWISSPROT: 
RETICULOCYTE BINDING PROTEIN 1 PRECURSOR . 

Score = 127, P = 3.9e-08, identities = 68/300, positives - 145/300 
Entry MMDSPPG_1 from database TREMBL: 

gene: "DSPP"; product: "dentin sialophosphoprotein" ; Mus musculus DSPP 
gene 

Score = 160, P = 5.2e-08, identities » 87/373, positives « 146/373 



Alert BLASTP hits for DKFZphfbr2_62nl0, frame 2 
No Alert BLASTP hits found 



270 



WO 01/12659 



PCT/TOOO/01496 



Pedant information for DKFZphfbr2_62nl0, frame 2 



Report for DKFZphfbr2_62nlO . 2 



[LENGTH] 541 

[MW] 60533.06 

[pi] 5.10 

[FUN CAT] 04.99 other transcription activities [S. cerevisiae, YKR092c] 3e-05 

[FONCAT] 30.10 nuclear organization [S. cerevisiae, YKR092c] 3e-05 

[PROSITE) LEUCINE_ZIPPER 1 

[PROSITE) MYRISTYL 7 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 18 

[PROSITE] PROKAR_LIPOPROTEIN 1 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 14 

[PROSITE] ASN_GLYCOSYLATION 7 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 9.24 % 

[KW] COILED__COIL 22.55 % 



SEQ MLSHTVRKHLRKTRLELLHKEYEDEIDCLQKEVEELKSKNLSLESQIKAILDPLTLVQGN 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhcccccccccc 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QNEDKHLVTDNPSI INPETVAEWKKKLRTANEIYEKVKDDVDKLKEANKKLKLENGGLVR 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD cccceeeeeccccccccchhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhcccceee 

COI LS cccccccccccccccccccccccccccccccccccccc 

SEQ ENLRLKAEVDNRSPQKFGRFAVAALQSKVEQYERETNRLKKALERSDKYIEELESQVAQL 

SEG 

PRD ehhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

coils cccccccccccccccccccccccccccccccccc 

SEQ KNSSEEKEAMNSICQTALSADGKGSKGSEEDWSKNQGDSARKQPGSSTSSSSHLAKPSS 

SEG xxxxxxxxxxxxxx 

PRD hcchhhhhhhhhhhhhhhccccccccccceeeeecccccccccccccccccccccccccc 

COILS CCCCCC 



SEQ SRLCDTSSARQESTSKADLNCSKNKDLYQEQVEVMLDVTDTSMDTYLEREWGNKPSDCVP 

SEG x 

PRD ccccccccccccccccccccccccchhhhhhhhhcccccccccchhhhhhhccccccccc 

COILS 

SEQ YKDEELYDFPAPCTPLSLSCLQLSTPENRESSVVQAGGSKKHSNHLRKLVFDDFCDSSNV 

SEG 

PRD cccccccccccccccccceeeecccccccceeeeeccccccccccccccccccccccccc 

COILS 

SEQ SNKDSSEDDISRSENEKKSECFSSTKTGFWDCCSTSYAQNLDFESSEGNTIANSVGEISS 

SEG 

PRD cccccccchhhhhccccccccccccccccccccccccccccccccccccccccccccccc 

COILS 



SEQ KLSEKSGLCLSKRLNSIRSFEMNRTRTSSEASMDAAYLDKISELDSMMSESDNSKSPCNN 

SEG 

PRD ccccccccchhhhhcccccccccccchhhhhhhhhhhhhhhhhccccccccccccccccc 

COILS 



SEQ GFKSLDLDGLSKSSQGSEFLEEPDKLEEKTELNLSKGSLTNDQLENGSEWKPTSFFSPLS 

SEG . .xxxxxxxxxxxxxxx 

PRD ccccccccccccccccceeecccchhhhhhhhhccccccccccccccccccccccccccc 

COILS 



SEQ I 
SEG 

PRD c 
COILS 



Prosite for DKFZphfbr2_62nl0.2 



PS00001 40->44 ASN_GLYCOSYLATION PDOC00001 

PS00001 182->186 ASN_GLYCOSYLATION PDOC00001 

PS00001 260->264 ASN GL YCOS YLAT I ON PDOC00001 
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(No Pfam data available for DKFZphfbr2_62nl0.2) 
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DKFZphfbr2_62ol7 



group: metabolism 

DKFZphfbr2_62ol7 .2 encodes a novel 282 amino acid protein with weak similarity to the 
apolipoprotein E receptor. 

The new protein contains a leucine zipper for protein-protein interaction, and three LDL- 
receptor class A domain (LDLRA_1) patterns. In LDL-receptors the class A domains form the 
binding site for LDL and calcium. The acidic residues between the fourth and sixth cysteines 
are important for high-affinity binding of positively charged sequences in LDLR's ligands. 

The new protein can find application in modulation of cholesterol binding and transport by 
LDL-receptors and LDL-binding proteins 



similarity to apolipoprotein E receptor 

complete cDNA, complete cds, start at Bp 56 matches kozak consensus 
ANCatg EST hits 

Sequenced by LMU 

Locus : unknown 

Insert length: 1260 bp 

Poly A stretch at pos. 1240, polyadenylation signal at pos . 1218 



1 GGGGGATAAG AGAGCGGTCT GGACAGCGCG TGGCCGGCGC CGCTGTGGGG 

51 ACAGCATGAG CGGCGGTTGG ATGGCGCAGG TTGGAGCGTG GCGAACAGGG 

101 GCTCTGGGCC TGGCGCTGCT GCTGCTGCTC GGCCTCGGAC TAGGCCTGGA 

151 GGCCGCCGCG AGCCCGCTTT CCACCCCGAC CTCTGCCCAG GCCGCAGGCC 

201 CCAGCTCAGG CTCGTGCCCA CCCACCAAGT TCCAGTGCCG CACCAGTGGC 

251 TTATGCGTGC CCCTCACCTG GCGCTGCGAC AGGGACTTGG ACTGCAGCGA 

301 TGGCAGCGAT GAGGAGGAGT GCAGGATTGA GCCATGTACC CAGAAAGGGC 

351 AATGCCCACC GCCCCCTGGC CTCCCCTGCC CCTGCACCGG CGTCAGTGAC 

401 TGCTCTGGGG GAACTGACAA GAAACTGCGC AACTGCAGCC GCCTGGCCTG 

451 CCTAGCAGGC GAGCTCCGTT GCACGCTGAG CGATGACTGC ATTCCACTCA 

501 CGTGGCGCTG CGACGGCCAC CCAGACTGTC CCGACTCCAG CGACGAGCTC 

551 GGCTGTGGAA CCAATGAGAT CCTCCCGGAA GGGGATGCCA CAACCATGGG 

601 GCCCCCTGTG ACCCTGGAGA GCGTCACCTC TCTCAGGAAT GCCACAACCA 

651 TGGGGCCCCC TGTGACCCTG GAGAGTGTCC CCTCTGTCGG GAATGCCACA 

701 TCCTCCTCTG CCGGAGACCA GTCTGGAAGC CCAACTGCCT ATGGGGTTAT 

751 TGCAGCTGCT GCGGTGCTCA GTGCAAGCCT GGTCACCGCC ACCCTCCTCC 

801 TTTTGTCCTG GCTCCGAGCC CAGGAGCGCC TCCGCCCACT GGGGTTACTG 

851 GTGGCCATGA AGGAGTCCCT GCTGCTGTCA GAACAGAAGA CCTCGCTGCC 

901 CTGAGGACAA GCACTTGCCA CCACCGTCAC TCAGCCCTGG GCGTAGCCGG 

951 ACAGGAGGAG AGCAGTGATG CGGATGGGTA CCCGGGCACA CCAGCCCTCA 

1001 GAGACCTGAG CTCTTCTGGC CACGTGGAAC CTCGAACCCG AGCTCCTGCA 

1051 GAAGTGGCCC TGGAGATTGA GGGTCCCTGG ACACTCCCTA TGGAGATCCG 

1101 GGGAGCTAGG ATGGGGAACC TGCCACAGCC AGAACCGAGG GGCTGGCCCC 

1151 AGGCAGCTCC CAGGGGGTAG GACGGCCCTG TGCTTAAGAC ACTCCTGCTG 

1201 CCCCGTCTGA GGGTGGCGAT TAAAGTTGCT TCACATCCTC AAAAAAAAAA 
1251 AAAAAAAAAC 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 56 bp to 901 bp; peptide length: 282 

Category: similarity to known protein 

Classification: unset 

Prosite motifs: LDLRA_1 (67-90) 

LDLRA_1 (67-90) 

LDLRA_1 (145-168) 
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LEUCINE ZIPPER (17-39) 



1 MSGGWMAQVG AWRTGALGLA LLLLLGLGLG LEAAASPLST PTSAQAAGPS 
51 SGSCPPTKFQ CRTSGLCVPL TWRCDRDLDC SDGSDEEECR IEPCTQKGQC 
101 PPPPGLPCPC TGVSDCSGGT DKKLRNCSRL ACLAGELRCT LSDDCIPLTW 
151 RCDGHPDCPD SSDELGCGTN EILPEGDATT MGPPVTLESV TSLRNATTMG 
201 PPVTLESVPS VGNATSSSAG DQSGSPTAYG VIAAAAVLSA SLVTATLLLL 
251 SWLRAQERLR PLGLLVAMKE SLLLSEQKTS LP 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_62ol7, frame 2 

TREMBL:AF110520_6 product: "NG29"; Mus musculus major 
histocompatibility complex region NG27, NG28, RPS28, NADH 
oxidoreductase, NG29, KIFC1, Fas-binding protein, BING1, tapasin, 
RalGDS-like, KE2, BING4 , beta 1, 3-galactosyl transferase, and RPS18 
genes, complete cds; Sacm21 gene, partial cds; and unknown gene., N - 
1, Score =» 733, P = 1.5e-72 

PIR:JE0237 apolipoprotein E receptor 2 precursor - mouse, N » 2, Score 
- 290, P = l.le-26 

TREMBL:HSZ7 5190_1 product: "apolipoprotein E receptor 2 906"; 

H. sapiens mRNA for apolipoprotein E receptor 2, N « 1, Score » 279, P - 

I. 8e-23 



>TREMBL: AF110520 6 product: "NG29"; Mus musculus major histocompatibility 
complex region NG27, NG28, RPS28, NADH oxidoreductase, NG29, KIFC1, 
Fas-binding protein, BING1, tapasin, RalGDS-like, KE2, BING4, beta 
1, 3-galactosyl transferase, and RPS18 genes, complete cds; Sacm21 gene, 
partial cds; and unknown gene. 
Length = 260 



HSPs: 



Score - 733 (110.0 bits), Expect =» 1.5e-72, P - 1.5e-72 
Identities - 157/276 (56%), Positives « 178/276 (64%) 



Query: 
Sbjct : 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



6 MAQVGAWRTGALGLALLLLLGLGLGLEAAASPLSTPTSAQAAGPSSGSCPPTKFQCRTSG 65 

MA+ GA R ALGL L LL GL GLEAA +P T Q +G + SCP FQC TSG 
1 MARGGAGRAVALGLVLRLLFGLRTGLEAAPAPAHT--RVQVSGSRADSCPTDTFQCLTSG 58 

66 LCVPLTWRCDRDLDCSDGSDEEECRIEPCTQKGQCPPPPGLPCPCTGVSDCSGGTDKKLR 125 

CVPL+WRCD D DCSDGSDEE+CRIE C Q GQC P LPC C +S CS +DK L 
59 YCVPLSWRCDGDQDCSDGSDEEDCRIESCAQNGQCQPQSALPCSCDNISGCSDVSDKNL- 
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126 NCSRLACLAGELRCTLSDDCIPLTWRCDGHPDCPDSSDELGCGTNEILPEGDATTMGPPV 185 

NCSR C EL C L D CIP TWRCDGHPDC DSSDEL C T+ 
118 NCSRPPCQESELHCILDDVCIPHTWRCDGHPDCLDSSDELSCDTD T 163 

186 TLESVTSLRNATTMGPPVTLESVPSVGNATSSSAGDQSGSPTAYGVIAAAAVLSASLVTA 245 

++ + NATT T+E+ S NT +SAGD S +P+AYGVIAAA VLSA LV+A 

164 EIDKIFQEENATTTRISTTMENETSFRNVTFTSAGDSSRNPSAYGVIAAAGVLSAILVSA 223 

246 TLLLLSWLRAQERLRPLGLLVAMKESLLLSEQKTSL 281 

TLL+L LR Q LP GLLVA+KESLLLSE+KTSL 
224 TLLILLRLRGQGYLPPPGLLVAVKESLLLSERKTSL 259 



Pedant information for DKFZphfbr2_62ol7, frame 2 



Report for DKFZphfbr2_62ol7 .2 



[LENGTH] 282 

[MW] 28991.19 

(pi) 4.61 

[HOMOL] TREMBL:AF110520_6 product: "NG29"; Mus musculus major histocompatibility 

complex region NG27, NG28, RPS28, NADH oxidoreductase, NG29, KIFC1, Fas-binding protein, . 
BING1, tapasin, RalGDS-like, KE2, BING4, beta 1, 3-galactosyl transferase, and RPS18 genes, 
complete cds; Sacm21 gene, partial cds; and unknown gene. 5e-55 
[BLOCKS) BL01209 LDL-receptor class A (LDLRA) domain proteins 

(SCOP) dlajj 7.11.1.1.1 Ligand-binding domain of low-density lipoprotei 2e-10 
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[PIRKW] duplication le-19 

['PIRKW] tandem repeat le-15 

[PIRKW] heterodiraer 6e-18 

[ PIRKW] endocytosis 4e-18 

[PIRKW] heparan sulfate 2e-12 

[PIRKW] VLDL le-19 

[PIRKW] transmembrane protein le-19 

[PIRKW] coated pits 4e-18 

[PIRKW] fatty acid metabolism le-19 

[PIRKW] G protein-coupled receptor le-10 

[PIRKW] receptor le-19 

[PIRKW] glycoprotein le-19 

[PIRKW] lipid transport 4e-18 

[PIRKW] LDL 5e-14 

[PIRKW] calcium binding 6e-18 

[PIRKW] extracellular protein 6e-13 

[PIRKW] alternative splicing le-19 

[PIRKW] extracellular matrix 3e-10 

[PIRKW] chondroitin sulfate proteoglycan 2e-12 

[PIRKW] cholesterol 4e-18 

[SUPFAM] leucine-rich alpha-2-glycoprotein repeat homology le-10 

[SUPFAM] LDL receptor YWTD-containing repeat homology le-19 

[SUPFAM] trypsin homology 6e-13 

[SUPFAM] alpha-2-macroglobulin receptor 6e-18 

[SUPFAM] LDL receptor le-19 

[SUPFAM] LDL receptor ligand-binding repeat homology le-19 

[SUPFAM] EGF homology le-19 

fPROSITE] LDLRA_1 3 V 

[PROSITE] LEUCINE_ZIPPER 1 

[PFAM] Low-density lipoprotein receptor domain class A 

[PFAM] TNFR/NGFR cysteine-rich region 

[KW] S IGNAL_PEPT I DE 31 

[KW] TRANSMEMBRANE 1 

[KW] LOW COMPLEXITY 22.34 % 



SEQ 
SEG 
PRD 
MEM 



MS GGWMAQ VG AW RT G ALG LAL L LLLGLG LGL EAAAS PLS T PT S AQAAG PSSGSCPPTKFQ 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

cccccccccccchhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccceee 



SEQ 
SEG 
PRD 
MEM 



CRTSGLCVPLTWRCDRDLDCSDGSDEEECRIEPCTQKGQCPPPPGLPCPCTGVSDCSGGT 

xxxxxxxxxxx 

ecccccceeeeecccccccccccccccccccccccccccccccccccccccccccccccc 



SEQ 
SEG 
PRD 
MEM 



DKKLRNCSRLACLAGELRCTLSDDCIPLTWRCDGHPDCPDSSDELGCGTNEILPEGDATT 
cccccccccccccccceeeccccccccccccccccccccccccccccccccccccccccc 



SEQ MGPPVTLESVTSLRNATTMGPPVTLESVPSVGNATSSSAGDQSGSPTAYGVIAAAAVLSA 

SEG xxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhh 

MEM MMMMMMM 

SEQ SLVTATLLLLSWLRAQERLRPLGLLVAMKESLLLSEQKTSLP 

SEG xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhcccccc 

MEM MMMMMMMMMM 



Prosite for DKFZphfbr2_62ol7 . 2 



PS01209 
PS01209 
PS01209 
PS00029 



67->90 
67->90 
145->168 
17->39 



LDLRA_1 
LDLRA_1 
LDLRA_1 

LEUCINE ZIPPER 



PDOC00929 
PDOC00929 
PDOC00929 
PDOC00029 



Pfam for DKFZphfbr2_62ol7 . 2 



HMM_NAME 

HMM 

Query 



TNFR/NGFR cysteine-rich region 



54 



* CpeGt Y t D . WNHvpqC 1 pC . t r Ce PEMGQ YMvq PCTwTQNT . VC * 
CP+ ++ + + C+P RC+ ++ +C + ++ +C 
CPPTKFQCRTS— GLCVPLTWRCDR— DL DCSDGSDEEEC 



89 
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HMM_NAME Low-density lipoprotein receptor domain class A 

HMM I'tTCeGPDEFQCgSGeMRCIPMsWvCDGDpDCeDWSDEWPeNChp* 

. . C P +FQC+++ C+P+ W+CD D DC D+SDE E+C+ 
Query 52 GSCP-PTKFQCRTSG-LCVPLTWRCDRDLDCSDGSDE — EECRI 91 

54.99 (bits) f: 130' t: 169 Target: dkf zphfbr2_62ol7 . 2 similarity to apolipoprotein E 
receptor 

Alignment to HMM consensus: 
Query *tTCeGPDEFQCgSGeMRCIPMsWvCDGDpDCeDWSDEWPeNChp* 
C + E +C + CIP+ W+CDG PDC D SDE ++C+ 

dkfzphfbr2 130 LACL-AGELRCTLSD-DCIPLTWRCDGHPDCPDSSDE— LGCGT 169 



276 



WO 01/12659 



PCT/IB00/01496 



DKFZphfbr2_64al5 



group: nucleic acid management 

DKFZphfbr2_64al5 encodes a novel 255 amino acid protein with strong similarity to inorganic 
pyrophosphatases 

Inorganic pyrophosphatase (EC 3.6.1.U (PPase) is the enzyme responsible for the hydrolysis of 
pyrophosphate (PPi) which is formed as the product of the many biosynthetic reactions that 
utilize ATP. All known PPases require the presence of divalent metal cations, with magnesium 
conferring the highest activity. 

The new protein can find application as a new enzyme for biotechnologic processes. 



strong similarity to inorganic pyrophosphatases 

unspliced Intron 212-256 see EST HS1190948 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1188 bp 

Poly A stretch at pos . 1170, polyadenylation signal at pos. 1151 



1 GGGGGTTGGG GACCAGTGCA GGGACCGGGT CGCGCCGTGC TATGGCCCTG 
51 TACCACACTG AGGAGCGCGG CCAGCCCTGC TCGCAGAATT ACCGCCTCTT 
101 CTTTAAGAAT GTAACTGGTC ACTACATTTC CCCCTTTCAT GATATTCCTC 
151 TGAAGGTGAA CTCTAAAGAG GACACTGAGG CTCAAGGCAT TTTTATAGAC 
201 TTGTCTAAGA TCTGGAAAAT GGCATTCCTA TGAAGAAAGC ACGAAATGAT 
251 GAATATGAGA ATCTGTTTAA TATGATTGTA GAAATACCTC GGTGGACAAA 
301 GGCTAAAATG GAGATTGCCA CCAAGGAGCC AATGAATCCC ATTAAACAAT 
351 ATGTAAAGGA TGGAAAGCTA CGCTATGTGG CGAATATCTT CCCTTACAAG 
401 GGTTATATAT GGAATTATGG TACCCTCCCT CAGACTTGGG AAGATCCCCA 
451 TGAAAAAGAT AAGAGCACGA ACTGCTTTGG AGATAATGAT CCTATTGATG 
501 TTTGCGAAAT AGGCTCAAAG ATTCTTTCTT GTGGAGAAGT TATTCATGTG 
551 AAGATCCTTG GAATTTTGGC TCTTATTGAT GAAGGTGAAA CAGATTGGAA 
601 ATTAATTGCT ATCAATGCGA ATGATCCTGA AGCCTCAAAG TTTCATGATA 
651 TTGATGATGT TAAGAAGTTC AAACCGGGTT ACCTGGAAGC TACTCTTAAT 
701 TGGTTTAGAT TATGTAAGGT ACCAGATGGA AAACCAGAAA ACCAGTTTGC 
751 TTTTAATGGA GAATTCAAAA ACAAGGCTTT TGCTCTTGAA GTTATTAAAT 
801 CCACTCATCA ATGTTGGAAA GCATTGCTTA TGAAGAACTG TAATGGAGGA 
851 GCTACAAATT GCACAAACGT GCAGATATCT GATAGCCCTT TCCGTTGCAC 
901 TCAAGAGGAA GCAAGATCAT TAGTTGAATC GGTATCATCT TCACCAAATA 
951 AAGAAAGTAA TGAAGAAGAG CAAGTGTGGC ACTTCCTTGG CAAGTGATTG 
1001 AAACATCTGA AATTCTGCTG TCAAGATTCC CATCTCTAAG GACTCCAAGA 
1051 CTCTTTTTCC CCAAGTGCTA GAGACAAGGG GGTCTATGAG CATTTACTGA 
1101 CTTCCTGTTA AAACTTCATT TTTTCAAACT TTTTGAGCTA TGCAATATAT 
1151 AAATAAACAG TAAGAATTTT AAAAAAAAAA AAAAAAAA 



BLAST Results 



Entry HSPPASEMR from database EMBL: 

H. sapiens partial mRNA for pyrophosphatase. 

Score = 1706, P = 1.6e-70, identities - 342/343 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 230 bp to 994 bp; peptide length: 255 
Category: strong similarity to known protein 
Classification: unset 
Prosite motifs: PPASE (85-92) 
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1 MKKARNDEYE 

51 ANIFPYKGYI 

101 CGEVIHVKIL 

151 YLEATLNWFR 

201 MKNCNGGATN 

251 HFLGK 



NLFNMIVEIP RWTKAKMEIA TKEPMNPIKQ YVKDGKLRYV 

WNYGTLPQTW EDPHEKDKST NCFGDNDPID VCEIGSKILS 

GILALIDEGE TDWKLIAINA NDPEASKFHD IDDVKKFKPG 

LCKVPDGKPE NQFAFNGEFK NKAFALEVIK STHQCWKALL 

CTNVQISDSP FRCTQEEARS LVESVSSSPN KESNEEEQVW 



BLASTP hits 



Entry IPYR_KLULA from database SWISSPROT: 

INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) { PYROPHOSPHATE PHOSPHO- 
HYDROLASE) (PPASE). 

Score = 689, P - 6.0e-68, identities = 128/248, positives - 170/248 

Entry A45153 from database PIR: 

inorganic pyrophosphatase (EC 3.6.1.1) - bovine 

Score = 862, P = 2.8e-86, identities = 146/226, positives = 190/226 
Entry AF085600_1 from database TREMBLNEW: 

gene: "Nurf~38 M ; product: "inorganic pyrophosphatase NURF-38"; 
Drosophila melanogaster inorganic pyrophosphatase NURF-38 (Nurf-38) 
gene, complete cds. 

Score = 731, P - 2.1e-72, identities = 134/248, positives - 177/248 
Entry PWBY from database PIR: 

inorganic pyrophosphatase {EC 3.6.1.1) - yeast (Saccharomyces 
cerevisiae) 

Score = 688, P «= 7.7e-68, identities - 133/251, positives = 174/251 



Alert BLASTP hits for DKFZphfbr2_64al5, frame 2 

SWISSPROT: I PYR_DROME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) 
(PYROPHOSPHATE PHOSPHO- HYDROLASE) (PPASE) . , N = 1, Score = 731, P - 
2.4e-72 



>SWISSPROT: IPYR_DROME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) (PYROPHOSPHATE 
PHOSPHO- HYDROLASE) (PPASE). 
Length = 290 

HSPs: 



Score - 731 (109.7 bits), Expect * 2.4e-72, P - 2.4e-72 
Identities - 134/248 (54%), Positives = 177/248 (71%) 



Query: 


7 


DEYENLFNMIVEIPRWTKAKMEIATKEPMNPIKQYVKDGKLRYVANIFPYKGYIWNYGTL 


66 




+E + ++NM+VE+ PRWT AKMEI+ K PMNPIKQ +K GKLR+VAN FP+KGYIWNYG L 




Sbjct: 


40 


NEEKTIYNMVVEVPRWTNAKMEISLKTPMNPIKQDIKKGKLRFVANCFPHKGYIWNYGAL 


99 


Query: 


67 


PQTWEDPHEKDKSTNCFGDNDPIDVCEIGSKILSCGEVIHVKILGILALIDEGETDWKLI 


126 




PQTWE+P + ST C GDNDPIDV EIG ++ G+V+ VK+LG ALIDEGETDWK+I 




Sbjct: 


100 


PQTWENPDHIEPSTGCKGDNDPIDVIEIGYRVAKRGDVLKVKVLGQFALIDEGETDWKII 


159 


Query: 


127 


AINANDPEASKFHDIDDVKKFKPGYLEATLNWFRLCKVPDGKPENQFAFNGEFKNKAFAL 


186 




AI+ NDP ASK +DI DV ++ PG L AT+ WF++ K+PDGKPENQFAFNG+ KN FA 




Sbjct: 


160 


AI DVNDPLASKVNDIADVDQYFPGLLRATVEWFKI YKI PDGKPENQFAFNGDAKNADFAN 


219 


Query: 


187 


EVIKSTHQCWKALLMKNCNGGATNCTNVQISDSPFRCTQEEARS-LVESVSSSPNKESNE 


245 






+1 TH+ W+ L+ ++ G+ + TN+ +S +EEA L E+ +E ++ 




Sbjct: 


220 


TIIAETHKFWQNLVHQSPASGSISTTNITNRNSEHVIPKEEAEKILAEAPDGGQVEEVSD 


279 


Query: 


246 


EEQVWHFL 253 








WHF+ 




Sbjct: 


260 


TVDTWHFI 287 





Peptide information for frame 3 



ORF from 42 bp to 230 bp; peptide length: 63 
Category: strong similarity to known protein 
Classification: unset 



1 MALYHTEERG QPCSQNYRLF FKNVTGHYIS PFHDIPLKVN SKEDTEAQGI 
51 FIDLSKIWKM AFL 
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BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_64al5, frame 3 

SWISSPROT:IPYR_DROME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) 
(PYROPHOSPHATE PHOSPHO- HYDROLASE) (PPASE)., N - 1, Score - 118, P - 
8.8e-07 

PIR:A45153 inorganic pyrophosphatase (EC 3.6.1.1) - bovine, N « 1, 
Score = 113, P = 3.1e-06 

TREMBLNEW:AF108211_1 product: "cytosolic inorganic pyrophosphatase"; 
Homo sapiens cytosolic inorganic pyrophosphatase mRNA, partial cds., N 
« 1, Score = 106, P => 1.8e-05 



>SWISSPROT:IPYR_DROME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) (PYROPHOSPHATE 
PHOSPHO- HYDROLASE) (PPASE). 
Length » 290 

HSPs: 

Score = 118 (17.7 bits), Expect « 8.8e-07, P = 8.8e-07 
Identities - 23/43 (53%), Positives = 29/43 (67%) 



Query: 1 MALYHTEERGQPCSQNYRLFFKNVTGHYISPFHDIPLKVNSKE 43 

MALY T E+G S +Y L+FKN G+ ISP HDIPL N ++ 
Sbjct: 1 MALYETVEKGAKNSPSYSLYFKNKCGNVISPMHDIPLYANEEK 43 

Pedant information for DKFZphfbr2_64al5, frame 2 



Report for DKFZphfbr2_64al5.2 

[LENGTH] 255 

[MW] 29177.34 

[pi] 5.67 

[HOMOL] TREMBLNEW:AF108211_1 product: "cytosolic inorganic pyrophosphatase"; Homo 

sapiens cytosolic inorganic pyrophosphatase mRNA, partial cds. 2e-93 
[FUNCAT] 01.04.01 phosphate utilization [S. cerevisiae, YBROllc) 9e-73 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YBROllc] 9e-73 

[FUNCAT] 02.99 other energy generation activities [S. cerevisiae, YMR267w) le-58 

[ FUNCAT ] 30.16 mitochondrial organization [S. cerevisiae, YMR267w] le-58 

[FUNCAT] 1 genome replication, transcription, recombination and repair [M. 

genitalium, MG351) le-06 

[FUNCAT] g carbohydrate metabolism and transport [H. influenzae, HI0124] 2e-06 

[BLOCKS] BL00387D 
[BLOCKS] BL00387C 
[BLOCKS] BL00387B 
[BLOCKS] BL00387A 

[SCOP] dlwgja_ 2.29.5.1.1 Inorganic pyrophosphatase [baker's yeas le-113 

[EC] 3.6.1.1 Inorganic pyrophosphatase 7e-92 

[PIRKW] mitochondrion 3e-57 

[PIRKW] hydrolase 7e-92 

[PIRKW] homodimer 2e-71 

[SUPFAM] inorganic pyrophosphatase 7e-92 

[PROSITE) PPASE 1 

[KW] Alpha_Beta 

[KW] 3D 

[KW] LOW_COMPLEXITY 6.27 % 

SEQ MKKARNDEYENLFNMIVEIPRWTKAKMEIATKEPMNPIKQYVKDGKLRYVANIFPYKGYI 

SEG 

lhukB EGGGCEEEEEEEETTTbCBCEEETTTTTTTCEEECEETTEECBCCBBTTBTTbT 



SEQ WNYGTLPQTWEDPHEKDKSTNCFGDNDPIDVCEIGSKILSCGEVIHVKILGILALIDEGE 

SEG 

lhukB CEEEETTTTCBTTTTEETTTTEECCCBCCEEEECCCCCCTTTEEEEEEEEEEEEETTTTB 

SEQ TDWKLIAINANDPEASKFHDIDDVKKFKPGYLEATLNWFRLCKVPDGKPENQFAFNGEFK 

SEG 

lhukB CEEEEEEEETTTTTGGGCCCHHHHHHHTTTHHHHHHHHHHHHCGGGCCCCCCBCGGGCCB 

SEQ NKAFALEVIKSTHQCWKALLMKNCNGGATNCTNVQISDSPFRCTQEEARSLVESVSSSPN 

SEG xxxxxxxxx 

lhukB CHHHHHHHHHHHHHHHHHHHHCTTTTTTTCCCBTTTTTTT 
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SEQ KESNEEEQVWHFLGK 

SEG xxxxxxx 

lhukB 



Prosite for DKFZphfbr2_64al5 . 2 
PS00387 85->92 PPASE PDOC00325 

(No Pfam data available for DKFZphfbr2_64al5.2) 

Pedant information for DKFZphfbr2_64al5, frame 3 

Report for DKFZphfbr2_64al5 .3 

[LENGTH) 63 

[MW] 7405.54 

[pi] 6.81 

[ HOMOL ] SWISSPROT:IPYR_DROME INORGANIC PYROPHOSPHATASE {EC 3.6.1.1) (PYROPHOSPHATE 

PHOSPHO- HYDROLASE) (PPASE). le-06 

[EC]' 3.6.1.1 Inorganic pyrophosphatase 5e-06 

[PIRKWJ hydrolase 5e-06 

[SUPFAM] inorganic pyrophosphatase 5e-06 

[KW] All_Beta 

SEQ MALYHTEERGQPCSQNYRLFFKNVTGHYISPFHDIPLKVNSKEDTEAQGIFIDLSKIWKM 
PRD cccccccccccccccceeeeeecccccccccccccccccccccccccceeeechhhhhhh 

SEQ AFL 
PRD CCC 

(No Prosite data available for DKFZphfbr2_64al5 . 3) 
(No Pfam data available for DKFZphfbr2_64al5 . 3) 
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DKFZphfbr2_64cl6 



group: brain derived 

DKFZphfbr2 64al6.2 encodes a novel 101 amino acid protein without similarity to known 
proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

unknown 

complete cDNA, complete cds, EST hits 
Sequenced by Qiagen 

Locus: /map= M 745_A_2; 756_F_2; 842_C_2" 
Insert length: 1866 bp 

Poly A stretch at pos. 1848, polyadenylation signal at pos. 1829 

1 GGGCGCGGCG CCGGAGGAGG AAGTGGTGAG GTTGTTGCTC CTTCAGCGCC 
51 TATCGCTGGC TCTTGGGGCG CAGAGAGGGG CCGCAGTCTC CGCGGCTGCG 

101 TCGAGCTCCC TTGCAGTCCC CTCCATGTTC CCCGGCGCCA CTACTCCCCT 

151 TCCTAAGGCC GCCGCTTACC CCGGGGTCTA TGGAAGTAAT GGAAGGACCC 

201 CTCAACCTGG CTCATCAACA GAGCAGACGA GCAGACCGTT TATTAGCTGC 

251 AGGCAAATAC GAAGAGGCTA TTTCTTGTCA CAAAAAGGCT GCAGCATATC 

301 TTTCTGAAGC CATGAAGCTG ACACAGTCAG AGCAGGCTCA TCTTTCACTG 

351 GAATTGCAAA GGGATAGCCA TATGAAACAG CTCCTCCTCA TCCAAGAGAG 

401 ATGGAAAAGG GCCCAGCGTG AAGAAAGATT GAAAGCCCAG CAGAACACAG 

451 ACAAGGATGC AGCTGCCCAT CTTCAGACAT CTCACAAACC CTCTGCAGAG 

501 GATGCAGAGG GCCAGAGTCC CCTTTCTCAG AAGTACAGCC CTTCCACAGA 

551 GAAATGCCTG CCTGAGATTC AGGGGATCTT TGACAGGGAT CCAGACACAC 

601 TACT T TAT TT ACTTCAGCAA AAGAGTGAGC CAGCAGAGCC ATGTATTGGA 

651 AGCAAAGCCC CAAAAGATGA TAAAACAATT ATAGAGGAGC AGGCAACCAA 

701 AATTGCAGAT TTGAAGAGGC ATGTGGAATT CCTTGTGGCT GAGAATGAAA 

7 51 GATTAAGGAA ' AGAAAATAAA CAACTAAAGG CTGAAAAGGC CAGACTTCTA 

801 AAAGGTCCAA TAGAAAAGGA GCTGGATGTA GATGCTGATT TTGTAGAAAC 

851 GTCAGAGTTA TGGAGCTTGC CACCACATGC AGAAACTGCT ACAGCCTCCT 

901 CAACCTGGCA GAAGTTCGCA GCAAATACTG GGAAAGCCAA GGACATTCCA 

951 ATCCCCAATC TTCCTCCCTT GGATTTTCCA TCTCCAGAAC TTCCTCTTAT 
1001 GGAGCTCTCT GAGGATATTC TGAAAGGACT TATGAATAAT TAAAATGGAA 
1051 GGCCACAGAA AAGGGGAAAA GAGGAAATAA TACAGTAATC GTTAATCCAG 
1101 CAAAAAGAAA TGAAAAGGGA AAACCACATA GAAGGGTAAT CCCGGAAATG 
1151 CTTCATCTGG TGGACTGTGG GAGCAGAGGC ATTGCCAGGA CTTGGGAAAC 
1201 AGTCACTGTG AAATGCGCTG CGTATCTCAT TCACTCACTT CAGCTAATGA 
1251 CTCCGACTTG GCAGACGCTA AACTCATGGA GGTTCGGTTT CTCCTGATAC 
1301 AAACCAAATG GCTACCTGGA AGAATTTCTT TCAAGCAACA GTTATTTTTC 
1351 TTATCTTCAG GGTTAAAATG TATAAAAGTT ATGTGTAATT AATCTATAAT 
1401 GCCATAAATG ATAATGCAAA ACCTAAATAA TATGGTGGCC GGAGGGGCTG 
1451 CCTTATATTT GAAACATGCT TTCTATCATG CATTGACTGT ATGCATTTTG 
1501 TTAATGCACA TTCTGTTTGT TTAAGGTGTG TGAGATACAC ACCTTTCTAG 
1551 ATGAAACTAT ATGTGCCACA CTTTGCACTA CTCATAATGA TAACCTCAAG 
1601 ACTATCAGAA GAAATATTTA AATTTCCATT TTATGAAGAA AGGAACCAAA 
1651 TTATTATGCT TTTTAAAACA AATTACCAGT TTACATAATT AATCAGGGTG 
1701 CATTTTAAGT TCTAACTTCG TTTATTGTAT AATGCATCAT TTGAAAATAC 
1751 CAAGGAGGAA ATACCCTTTG TTTTTAATGA TGCAAGAGTG GACGTAATGC 
1801 TAGTTGGCAG TATTTTATTG TAAGAAATCA ATAAAGTAAT TGTGTTTTAA 
1851 AAAAAAAAAA AAAAAA 



BLAST Results 



Entry HS28614 3 from database EMBL: 
human STS WI-6844. 
Score - 1460, P « 3.4e-61, identities - 292/292 



Medline entries 



No Medline entry 
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Peptide information for. frame 2 



ORF from the beginning to 304 bp; peptide length: 102 
Category: questionable ORF 
Classification: unset 



1 GAAPEEEVVR LLLLQRLSLA LGAQRGAAVS AAASSSLAVP SMFPGATTPL 
51 PKAAAYPGVY GSNGRTPQPG SSTEQTSRPF ISCRQIRRGY FLSQKGCSIS 
101 F 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphfbr2_64cl6, frame 2 



No Alert BLASTP hits found 



Peptide information for frame 3 



ORF from 180 bp to 1040 bp; peptide length: 287 
Category: putative protein 
Classification: unset 

Prosite motifs: LEUCINE_ZIPPER (178-200) 
LEUCINE_ZIPPER (185-207) 



1 MEVMEGPLNL AHQQSRRADR LLAAGKYEEA ISCHKKAAAY LSEAMKLTQS 

51 EQAHLSLELQ RDSHMKQLLL IQERWKRAQR EERLKAQQNT DKDAAAHLQT 

101 SHKPSAEDAE GQSPLSQKYS PSTEKCLPEI QGIFDRDPDT LLYLLQQKSE 

151 PAEPCIGSKA PKDDKTIIEE QATKIADLKR HVEFLVAENE RLRKENKQLK 

201 AEKARLLKGP IEKELDVDAD FVETSELWSL PPHAETATAS STWQKFAANT 

251 GKAKDIPIPN LPPLDFPSPE LPLMELSEDI LKGLMNN 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_64cl6, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_64cl6, frame 2 



Report for DKFZphfbr2_64cl6 . 2 



[LENGTH] 101 

[MW] 10469.94 

[pi] 10.18 

[KW] All_Alpha 

[KW] LOW COMPLEXITY 29.70 % 



SEQ GAAPEEEVVRLLLLQRLSLALGAQRGAAVSAAASSSLAVPSMFPGATTPLPKAAAYPGVY 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccc 

SEQ GSNGRTPQPGSSTEQTSRPFI SCRQI RRG YFLSQKGCSISF 

SEG 

PRD ccccccccccccccccccccchhhhhccccccccccccccc 



(No Prosite data available for DKFZphfbr2_64cl6.2) 
(No Pfam data available for DKFZphfbr2_64cl6. 2) 



Pedant information for DKFZphfbr2_64cl6, frame 3 
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Report for DKFZphfbr2_64cl6. 3 



(LENGTH] 



287 

32343.79 
5.61 

LEUCINE ZIPPER 2 
AJULJUpha 
COILED COIL 



[MW] 
tpU 



(PROSITE) 



tKW] 
[KW] 



14.98 % 



SEQ MEVMEGPLNLAHQQSRRADRLLAAGKYEEAISCHKKAAAYLSEAMKLTQSEQAHLSLELQ 

PRD ccccchhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

SEQ RDSHMKQLLLIQERWKRAQREERLKAQQNTDKDAAAHLQTSHKPSAEDAEGQSPLSQKYS 

PRD hhcchhhhhhhhhhhhhhhhhhhhhhhhccccchhhhhhhcccccccccccccccccccc 

COILS 

SEQ PSTEKCLPEIQGIFDRDPDTLLYLLQQKSEPAEPCIGSKAPKDDKTIIEEQATKIADLKR 

PRD cccccccchhhhhcccccchhhhhhhhhcccccccccccccccchhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCC 

SEQ HVEFLVAENERLRKENKQLKAEKARLLKGPIEKELDVDADFVETSELWSLPPHAETATAS 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccccccccc 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ STWQKFAANTGKAKDIPIPNLPPLDFPSPELPLMELSEDILKGLMNN 

PRD hhhhhhhhhcccccccccccccccccccccchhhhhhhhhhhhhccc 



COILS 



Prosite for DKFZphfbr2_64cl6.3 



PS00029 
PS00029 



178->200 
185->207 



LEUCINE_ZIPPER 
LEUCINE ZIPPER 



PDOC00029 
PDOC00029 



(No Pfam data available for DKFZphfbr2_64cl6.3) 
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DKFZphfbr2_64c4 



group: brain derived 

DKFZphfbr2_64c4 encodes a novel 467 amino acid protein with similarity to A. thaliana T08H3.5 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to A. thaliana T08I13.5 

complete cDNA, complete cds, EST hits 

on genomic level encoded by AC005043 11 exons 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1559 bp 

Poly A stretch at pos. 1540, no polyadenylation signal found 



1 TGGGACCGCC GGAAGTTTCT GCCGCGGCTT TGCGGGGACG GGGGAGTGGT 

51 AGTGGGGGCT GCAGCTGCCG GACCCAGGCG CGATGGCTAC GGGCGCGGAT 

101 GTACGGGACA TTCTAGAACT CGGGGGTCCA GAAGGGGATG CAGCCTCTGG 

151 GAG CATC AGC AAGAAGGACA TTATCAACCC GGACAAGAAA AAATCCAAGA 

201 AGTCCTCTGA GACACTGACT TTCAAGAGGC CCGAGGGCAT GCACCGGGAA 

251 GTCTATGCCT TGCTCTACTC TGACAAGAAG GATGCACCCC CACTGCTACC 

301 CAGTGACACT GGCCAGGGAT ACCGTACAGT GAAGGCCAAG TTGGGCTCCA 

351 AGAAGGTGCG GCCTTGGAAG TGGATGCCAT TCACCAACCC GGCCCGCAAG 

401 GACGGAGCAA TGTTCTTCCA CTGGCGACGT GCAGCGGAGG AGGGCAAGGA 

451 CTACCCCTTT GCCAGGTTCA ATAAGACTGT GCAGGAGCCT GTGTACTCGG 

501 AGCAGGAGTA CCAGCTTTAT CTCCACGATA ATGCTTGGAC TAAGGCAGAA 

551 ACTGACCACC TCTTTGACCT CAGCCGCCGC TTTGACCTGC GTTTTGTTGT 

601 TATCCATGAC CGGTATGACC ACCAGCAGTT CAAGAAGCGT TCTGTGGAAG 

651 ACCTGAAGGA GCGGTACTAC CACATCTGTG CTAAGCTTGC CAACGTGCGG 

701 GCTGTGCCAG GCACAGACCT TAAGATACCA GTATTTGATG CTGGGCACGA 

751 ACGACGGCGG AAGGAACAGC TTGAGCGTCT CTACAACCGG ACCCCAGAGC 

801 AGGTGGCAGA GGAGGAGTAC CTGCTACAGG AGCTGCGCAA GATTGAGGCC 

851 CGGAAGAAGG AGCGGGAGAA ACGCAGCCAG GACCTGCAGA AGCTGATCAC 

901 AGCGGCAGAC ACCACTGCAG AGCAGCGGCG CACGGAACGC AAGGCCCCCA 

951 AAAAGAAGCT ACCCCAGAAA AAGGAGGCTG AGAAGCCGGC TGTTCCTGAG 

1001 ACTGCAGGCA TCAAGTTTCC AGACTTCAAG TCTGCAGGTG TCACGCTGCG 

1051 GAGCCAACGG ATGAAGCTGC CAAGCTCTGT GGGACAGAAG AAGATCAAGG 

1101 CCCTGGAACA GATGCTGCTG GAGCTTGGTG TGGAGCTGAG CCCGACACCT 

1151 ACGGAGGAGC TGGTGCACAT GTTCAATGAG CTGCGAAGCG ACCTGGTGCT 

1201 GCTCTACGAG CTCAAGCAGG CCTGTGCCAA CTGCGAGTAT GAGCTGCAGA 

1251 TGCTGCGGCA CCGTCATGAG GCACTGGCCC GGGCTGGTGT GCTAGGGGGC 

1301 CCTGCCACAC CAGCATCAGG CCCAGGCCCG GCCTCTGCTG AGCCGGCAGT 

1351 GTCTGAACCC GGACTTGGTC CTGACCCCAA GGACACCATC ATTGATGTGG 

1401 TGGGCGCACC CCTCACGCCC AATTCGAGAA AGCGACGGGA GTCGGCCTCC 

1451 AGCTCATCTT CCGTGAAGAA AGCCAAGAAG CCGTGAGAGG CCCCACGGGG 

1501 TGTGGGCGAC GCTGTTATGT AAATAGAGCT GCTGAGTTGG AAAAAAAAAA 

1551 AAAAAAAAA 



BLAST Results 



Entry AC005043 from database EMBL: 

Homo sapiens clone NH0576N21; HTGS phase 1, 5 unordered pieces. 
Score - 1506, P = 4.6e-244, identities - 316/330 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 83 bp to 1483 bp; peptide length: 4 67 
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Category: similarity to unknown protein 



1 MATGADVRDI LELGGPEGDA ASGTISKKDI INPDKKKSKK SSETLTFKRP 
51 EGMHREVYAL LYSDKKDAPP LLPSDTGQGY RTVKAKLGSK KVRPWKWMPF 
101 TNPARKDGAM FFHWRRAAEE GKDYPFARFN KTVQEPVYSE QEYQLYLHDN 
151 AWTKAETDHL FDLSRRFDLR FVVIHDRYDH QQFKKRSVED LKERYYHICA 
201 KLANVRAVPG TDLKIPVFDA GHERRRKEQL ERLYNRTPEQ VAEEEYLLQE 
251 LRKIEARKKE REKRSQDLQK LITAADTTAE QRRTERKAPK KKLPQKKEAE 
301 KPAVPETAGI KFPDFKSAGV TLRSQRMKLP SSVGQKKIKA LEQMLLELGV 
351 ELSPTPTEEL VHMFNELRSD LVLLYELKQA CANCEYELQM LRHRHEALAR 
401 AGVLGGPATP ASGPGPASAE PAVSEPGLGP DPKDTIIDVV GAPLTPNSRK 
4 51 RRESASSSSS VKKAKKP 

BLAST P hits 

Entry ATAC2337_5 from database TREMBLNEW : 

gene: "T08I13.5 W ; Arabidopsis thaliana chromosome II BAG T08I13 
genomic sequence, complete sequence. 

Score - 340, P = 2.6e-30, identities » 115/374, positives = 176/374 

Entry YE8D_SCHPO from database SWISSPROT: 

HYPOTHETICAL 47.1 KD PROTEIN C9G1.13C IN CHROMOSOME I. 

Score » 221, P « 1.9e-20, identities = 67/192, positives « 97/192 

Entry S64291 from database PIR: 

hypothetical protein YGR002c - yeast (Saccharomyces cerevisiae) 
Score - 202, P - 2.8e-13, identities = 71/260, positives = 124/260 



Alert BLASTP hits for DKFZphfbr2_64c4 , frame 2 
No Alert BLASTP hits found 



Pedant information for DKFZphfbr2_64c4 , frame 2 





Report for DKFZphfbr2_ 


_64c4 .2 


[ LENGTH 1 


467 




[MW] 


53007.60 




tpU 


9.51 




[ HOMOL ] 


TREMBL : ATAC2337 5 gene: "T08I13 . 5"; Arabidopsis thaliana chromosome II BAC 


T08I13 genomic 


sequence, complete sequence. 


4e-29 


(FUNCAT] 


99 unclassified proteins 


[S. cerevisiae, YGR002c] le-19 


[PROSITE] 


MYRISTYL 1 




[PROSITE] 


CAMP PHOSPHO SITE 4 




[PROSITE] 


CK2 PHOSPHO SITE 10 




[ PROSITE) 


TYR~PHOSPHO~SITE 3 




( PROSITE] 


GLYCOSAMINOGLYCAN 1 




[PROSITE] 


PKC PHOSPHO SITE 12 




[PROSITE] 


ASN GLYCOSYLATION 1 




[KW] 


All~Alpha 




[KW] 


LOW COMPLEXITY 20.13 % 





SEQ MATGADVRDI LELGGPEGDAASGTISKKDI I NPDKKKSKKSSETLTFKRPEGMHREVYAL 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccceeeeeeeeeeccccccccccccccccccccccccccccccccccccccchhhhhhhh 

SEQ LYSDKKDAPPLLPSDTGQGYRTVKAKLGSKKVRPWKWMPFTNPARKDGAMFFHWRRAAEE 

SEG 

PRD hhhhccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhc 

SEQ GKDYPFARFNKTVQEPVYSEQEYQLYLHDNAWTKAETDHLFDLSRRFDLRFVVIHDRYDH 

SEG 

PRD ccccccccccccccccchhhhhhhhhhhcchhhhhhhhhhhhhhhhccceeeeeeccccc 

SEQ QQFKKRSVEDLKERYYHICAKLANVRAVPGTDLKI P V FD AG H E RRRK EQLE RL Y N RT P EQ 

SEG 

PRD chhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccchhhhhhhhhhhhhhhhhcchhh 

SEQ VAEEEYLLQELRKIEARKKEREKRSQDLQKLITAADTTAEQRRTERKAPKKKLPQKKEAE 

SEG xxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KPAVPETAGI KFPDFKSAGVTLRSQRMKLPSSVGQKKIKALEQMLLELGVELSPTPTEEL 

SEG xxx 
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PRD 



hccccccccccccccccceeehhhhhhhccccccchhhhhhhhhhhhhhhhcccccchhh 



SEQ VHMFNELRSDLVLLYELKQACANCEYELQMLRHRHEALARAGVLGGPATPASGPGPASAE 

SEG xxxxxxxxxxxxxxxx 

PRD hhhhhhccchhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccccccccccccccccccc 

SEQ PAVSEPGLGPDPKDTIIDVVGAPLTPNSRKRRESASSSSSVKKAKKP 

SEG xxxxxxx xxxxxxxxxxxxxxxxxxx . 

PRD cccccccccccccceeeeeccccccccccccccccccccceeecccc 



Prosite for DKFZphfbr2_64c4 . 2 



PS00001 
PS00002 
PS00004 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00007 
PS00007 
PS00008 



130- >134 
412->416 

35->39 
39->43 
184->188 
451->455 
26->29 
38->41 
46->49 
63->66 
82->85 
89->92 
164->167 
284->287 
321->324 
324->327 
448->451 
460->463 
3->7 
26->30 
132->136 
139->143 
153->157 
187->191 
273->277 
277->281 
355->359 
435->439 

131- >139 
227~>235 
116->125 

14->20 



ASN GLYCOSYLATION 

G LYCOS AM I NOGLYCAN 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

CAMP_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSFHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHCTsiT£ 

PKC_PHOSPHO_S I TE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

T YR^PHOS PHO_S I TE 

TYR_PHOSPHO_SITE 

MYRISTYL 



PDOC00001 
PDOC00002 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDCC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDCC000O6 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00008 



(No Pfam data available for DKFZphfbr2_64c4 . 2 ) 
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DKFZphfbr2_64h6 



group: brain derived 

DKFZphfbr2_64h6 encodes a novel 176 amino acid protein with similarity to predicted yeast 
proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

similarity to S.pombe SPBC337.09 and S.cerevisiae YER044c 

complete cDNA, complete cds accoring to YER044c/SPBC337.09, 
start at Bp 111, EST hits 

Sequenced by Qiagen 

Locus: /map="14" 

Insert length: 1212 bp 

Poly A stretch at pos. 1192, polyadenylation signal at pos. 1168 

1 GGGCTGGAGC TGTCCTGGGG GAGCTTGTTT GCGGCAGCGG CTGCTGCTGC 
51 CACTGCTGTG CTGGGGGCCC GGTCGCCAGG CAAAAAGCCC TCCCACGTTT 

101 GAGGGGAGTC ATGAGCCGTT TCCTGAATGT GTTAAGAAGT TGGCTGGTTA 

151 TGGTGTCCAT CATAGCCATG GGGAACACGC TGCAGAGCTT CCGAGACCAC 

201 ACTTTTCTCT ATGAAAAGCT CTACACTGGC AAGCCAAACC TTGTGAATGG 

251 CCTCCAAGCT CGGACCTTTG GGATCTGGAC GCTGCTCTCA TCAGTGATCC 

301 GCTGCCTCTG TGCCATTGAC ATTCACAACA AGACGCTCTA TCACATCACA 

351 CTCTGGACCT TCCTCCTTGC CCTGGGGCAT TTCCTCTCTG AGTTGTTTGT 

401 CTATGGAACT GCAGCTCCCA CGATTGGCGT CCTGGCACCC CTGATGGTGG 

4 51 CAAGTTTCTC CATCCTGGGT ATGCTGGTCG GGCTCCGGTA TCTAGAAGTA 

501 GAACCAGTAT CCAGACAGAA GAAGAGAAAC TGAGGCCAGC ATTATCACCT 

551 CCAGGACTTT CTCGTTTTCC ACCTTGGCCA TCTTCTTCCT TCGTCGTCTC 

601 TCCCCTTTAA TTTCTTTTCT ATTCCATCAT CTGCCCTTTT ACTCACTTTT 

651 AGCCTCTTTT TTTAATTTTT AAAATTTAAA GAT AT GC ATA CTGAAAAGTA 

701 TATAACATGT ACGTACAATT TAAAGAATAA TTTTAAAGTG AATACTACGT 

751 AACTCCATCC AAGTCAAGAA ATTGCCAGCT TCTCGGAAGC CCACTGTGTC 

801 TCCTTCCCCT ACCTGCAACC TCTTCCAGGC TCCCTTTTCC AGCCTTCCCC 

851 TTTTTCCCTT TTATTTTCAT GCCTTGATTT GACTTGTGTG GTGGGAACAT 

901 GTGAACTATG AAACTTAAAC CTGCTGCCCA CCCAGAGCAG CTGTGACCAA 

951 GGGCTGCCTC AAGGGGTTGT CCACGCAGGT TGGGCTCCTC TCTGCTGCTG 
1001 GACCCAAGAC TCTGAACCTT CCAAGGGACA GGCAGTTCTT CTGAGAAGGG 
1051 CTCCCCTGTG TGTGAGCAAG ACCACAGCTC TCCTTCTATC TACAGATGCA 
1101 TGAGGGTTGG AAGAGTCTGG GCTGTTTTTA GACCTTCTGG TCAGCTGTAT 
1151 TTGTGTAACA ACTTTTGTAA TAAATAGAAA AACCCTCTGC TCAAAAAAAA 
1201 AAAAAAAAAA AA 

BLAST Results 



Entry G38566 from database EMBL: 

SHGC-64295 Human Homo sapiens STS genomic, sequence tagged site. 
Score = 1398, P = 1.4e-56, identities = 284/288 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 0 bp to 530 bp; peptide length: 177 
Category: similarity to unknown protein 
Classification: unclassified 



1 AGAVLGELVC GSGCCCHCCA GGPVARQKAL PRLRGVMSRF LNVLRSWLVM 
51 VSIIAMGNTL QSFRDHTFLY EKLYTGKPNL VNGLQARTFG IWTLLSSVIR 
101 CLCAIDIHNK TLYHITLWTF LLALGHFLSE LFVYGTAAPT IGVLAPLMVA 
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151 SFSILGMLVG LRYLEVEPVS RQKKRN 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_64h6, frame 3 

TREMBL:SPBC337_9 gene: "SPBC337 . 09"*; product: "conserved hypothetical 
protein"; S.pombe chromosome II cosmid c337., N « 1, Score - 224, P « 
1.4e-18 

PIR:S50547 hypothetical protein YER044c - yeast (Saccharomyces 
cerevisiae), N = 1, Score ~ 192, P = 3.4e-15 



>TREMBL:SPBC337_9 gene: "SPBC337 . 09"; product: "conserved hypothetical 
protein"; S.pombe chromosome II cosmid c337. 
Length - 136 

HSPs: 

Score = 224 (33.6 bits), Expect « 1.4e-18, P - 1.4e-18 
Identities = 49/113 (43%), Positives = 74/113 (65%) 



Query: 42 NVLRSWLVWSIIAMGNTLQSFRDHTFLYEKLYTGKPNLVNGLQARTFGIWTLLSSVIRC 101 

+++ W V+VS+ A+ NT+QSF L +++Y+ N VNGLQ RTFGIWTLLS+++R 

Sbjct: 11 SLVAKWNVVVSVAALFNTVQSFLTPK-LTKRVYSNT-NEVNGLQGRTFGIWTLLSAIVRF 68 

Query: 102 LCAIDIHNKTLYHITLWTFLLALGHFLSELFVYGTAAPTIGVLAPLMVASFSI 154 

CA IN +Y + T+ LA HFLSE ++ T G+L+P++V++ SI 

Sbjct: 69 YCAYHITNPDVYFLCQCTYYLACFHFLSEWLLFRTTNLGPGLLSPIWSTVSI 121 

Pedant information for DKFZphfbr2_64h6, frame 3 



Report for DKFZphfbr2_64h6. 3 

(LENGTH] 176 

(MWJ 19359.31 

[pU 9.53 

[HOMOL] TREMBL:SPBC337_9 gene: "SPBC337 . 09"; product: "conserved hypothetical protein" 

S.pombe chromosome II cosmid c337. 2e-17 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YER044cJ 7e-16 

[KW] TRANSMEMBRANE 2 

[KW] LOW_COMPLEXITY 7.39 % 

SEQ AGAVLGELVCGSGCCCHCCAGGPVARQKALPRLRGVMSRFLN VLRSWLVMVS I IAMGNTL 

SEG xxxxxxxxxxxxx 

PRD ccceeeeeeeeccceeeeccccccccccccccccchhhhhhhhhhhhhhheeeecccccc 
MEM MMMMMMMMMMMMMMMMM .... 



SEQ QSFRDHTFLYEKLYTGKPNLVNGLQART FGIWTLLSSVIRCLCAIDIHNKTLYHITLWTF 

SEG 

PRD ccccchhhhhhhhhhcccccccccccccccchhhhhhhhhhhhhhhccccceeeehhhhh 

MEM 

SEQ LLALGHFLSELFVYGTAAPTIGVLAPLMVASFSILGMLVGLRYLEVEPVSRQKKRN 

SEG 

PRD hhhhhhhhhhhhhhhccccccccccceeehhhhhhhhhhhheeeeecccccccccc 

MEM MMMMMMMMMMMMMMMMM 



(No Prosite data available for DKFZphfbr2_64h6. 3) 
(No Pfam data available for DKFZphfbr2_64h6. 3) 
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DKFZphfbr2_64jl8 



group: Intracellular transport and trafficking 

DKFZphfbr2_624 j 18 . 1 encodes a novel 180 amino acid protein nearly identical to the microsomal 
signal peptidase 23 kd subunit of canis familiaris, gallus gallus and C. elegans. 

The new protein is identical to canine and chicken microsomal signal peptidase 23 kd subunit. 
The canine microsomal signal peptidase is a protein complex comprised of five subunits (25, 
22/23, 21, 18, and 12 kDa) . The 23kDa subunit is tightly associated with the 18- and 21-kDa 
subunits, that are integral membrane proteins. 

The new protein can find application in modulation of protein transport into microsomal 
compartments and as a tool for proteomic analysis. 



strong similarity to dog signal peptidase (EC 3.4.99.-) 

complete cDNA, complete cds, potential start at Bp 109, EST hits, 

Sequenced by Qiagen 

Locus : unknown 

insert length: 690 bp 

Poly A stretch at pos . 666, polyadenylation signal at pos. 646 



1 GCCGGAACGC GCGCACCGCA GACGGCGCGG ATCGCAGGGA GCCGGTCCGC 

51 CGCCGGAACG GGAGCCTGGG TGTGCGTGTG GAGTCCGGAC TCGTGGGAGA 

101 CGATCGCGAT GAACACGGTG CTGTCGCGGG CGAACTCACT GTTCGCCTTC 

151 TCGCTGAGCG TGATGGCGGC GCTCACCTTC GGCTGCTTCA TCACCACCGC 

201 CTTCAAAGAC AGGAGCGTCC CGGTGCGGCT GCACGTCTCG CGGATCATGC 

251 TAAAAAATGT AGAAGATTTC ACTGGACCTA GAGAAAGAAG TGATCTGGGA 

301 TTTATCACAT CTGATATAAC TGCTGATCTA GAGAATATAT TTGATTGGAA 

351 TGTTAAGCAG TTGTTTCTTT ATTTATCAGC AGAATATTCA ACAAAAAATA 

401 ATGCTCTGAA CCAAGTTGTC CTATGGGACA AGATTGTTTT GAGAGGTGAT 

451 AATCCGAAGC TGCTGCTGAA AGATATGAAA ACAAAATATT TTTTCTTTGA 

501 CGATGGAAAT GGTCTCAAGG GAAACAGGAA TGTCACTTTG ACCCTGTCTT 

551 GGAACGTCGT ACCAAATGCT GGAATTCTAC CTCTTGTGAC AGGATCAGGA 

601 CACGTATCTG TCCCATTTCC AGATACATAT GAAATAACGA AGAGTTATTA 

651 AATTATTCTG AATTTGAAAC AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



89034208: 

cDNA-derived primary structure of the glycoprotein component of canine 
microsomal 

signal peptidase complex. 



Peptide information for frame 1 



ORF from 109 bp to 648 bp; peptide length: 180 
Category: strong similarity to known protein 
Prosite motifs: TONB_DEPENDENT REC 1 (1-58) 
RGD (148-151) 



1 MNTVLSRANS LFAFSLSVMA ALTFGCFITT AFKDRSVPVR LHVSRIMLKN 

51 VEDFTGPRER SDLGFITSDI TADLENIFDW NVKQLFLYLS AEYSTKNNAL 

101 NQVVLWDKIV LRGDNPKLLL KDMKTKYFFF DDGNGLKGNR NVTLTLSWNV 

151 VPNAGILPLV TGSGHVSVPF PDTYEITKSY 

BLAST P hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_64 j 18, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_64 jl8, frame 1 



Report for DKFZphfbr2_64jl8 . 1 



(LENGTH] 


180 




[MW] 


20253.39 




[pi] 


8.66 




[HOMOL] 


PIR:A31788 signal peptidase (EC 3,4.99.-) (SPC 22/23) - dog le-100 


[FUNCAT] 


30.07 organization of 


endoplasmatic reticulum (S. cerevisiae, 


6e-15 






{ FUNCAT) 


06.07 protein modification (glycolsylation, acylation, myristylation 


palmitylation, 


farnesylation and processing) (S. cerevisiae, YLR066w] 6e-15 


[PIRKW] 


transmembrane protein 


2e-92 


[PIRKW] 


glycoprotein 2e-92 




(PIRKW) 


hydrolase 2e-92 




[PROSITE] 


RGD 1 




(PROSITE] 


MYRISTYL 2 




[ PROSITE J 


PROKAR LIPOPROTEIN 


1 


[PROSITE] 


TONB DEPENDENT REC 1 


1 


(PROSITE] 


PKC PHOSPHO SITE 


1 


(PROSITE] 


ASN~GLYCOS YLAT I ON 


1 


IKW] 


Alpha Beta 




[KW] 


SIGNAL_PEPTIDE 32 





SEQ* MNTVLSRANSLFAFSLSVMAALTFGCFITTAFKDRSVPVRLHVSRIMLKNVEDFTGPRER 

PRD ccccccchhhhhhhhhhhhhhhhhhhhhheeecccccceeehhhhhhhhhhhhccccccc 

SEQ SDLGFITSDITADLENIFDWNVKQLFLYLSAEYSTKNNALNQWLWDKIVLRGDNPKLLL 

PRD ccccchhhhhhhhccccccchhhhhhhhhhhhhhhccccceeeeeeeceeecccchhhhh 

SEQ KDMKTKYFFFDDGNGLKGNRNVTLTLSWNWPNAGILPLVTGSGHVSVPFPDTYEITKSY 

PRD hhcccceeeeecccccccccceeeeeeeecccccceeeeeccccceeeeccccccccccc 



Prosite for DKFZphfbr2_64 j 18 . 1 



PS00001 141->145 ASN_GLYCOS YLAT ION PDOC00001 

PS00005 94->97 PKC_PHOSPHO_SITE PDOC00005 

PS00008 25->31 MYRISTYL PDOC00008 

PS00008 135->141 MYRISTYL PDOC00008 

PS00013 16->27 PROKAR_LIPOPROTEIN PDOC00013 

PS00016 112->115 RGD PDOC00016 

PS00430 l->22 TONB DEPENDENT REC 1 PDOC00354 



(No Pfam data available for DKFZphfbr2_64 jl8 . 1 ) 
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DKFZphfbr2_64k24 



group: transmembrane proteins 

DKFZphfbr2_64k24 encodes a novel 412 amino acid protein with weak similarity to several known 
proteins. 

The novel protein contains 5 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



similarity to AMACl "testicular condensing enzyme" ; 
membrane regions: 5 

Summary DKFZphfbr2_64k24 encodes a novel 412 amino acid protein, with 
similarity to AMACl"; product: "testicular condensing enzyme 



similarity to AMACl "testicular condensing enzyme" 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1958 bp 

Poly A stretch at pos. 1939, polyadenylation signal at pos. 1918 



1 GGGCCCGCCT CGATTTTCCC AGGCGAGGGC ACGCCCGCGT CAGTCGCCTC 
51 CGGGGCACCT TCCTCGCCAC GACACGCAGG TAACCGGGCC CCGGGAGCCG 
101 GTCGGCGGCG GCGGACTGGG ACCTTGATCC TGCCTGCCCG GCCGCCCGAC 
151 AAGGGAATGA GAGCGGACCC CGAACTCCAC ACACCCGCGT TTAGCCGCCA 
201 CACCTAAGGG GCAGAACAGT CTTTTTGGGT AAGGGCCGGG CTGGGGGCGA 
251 CGCGCCCCGC CCGCTTTGCA GACTTCGGGG TGCTCTGCAC GACGCCTGAA 
301 AGGCCGCGGG GCCCGCATTT CTCTGTGCTG CCCTCCTGGA GAACCGGGAC 
351 ACGGGGACGG GAGGGCCAGC ATCGGCTACG GCCCGGTTTC CCGTTTCTTT 
401 CCTCTGTCGC GTCTGGGCCC TCCTGCAGCG TCCATGATGA AGGCCAGGGG 
451 CTGTTGCTTT CCTCTCGCCC AGTAGCCAAC CCAAGCAAGG GAATTAATTA 
501 TCTGAAGAAA TGGATACTTC TCCCTCCAGA AAATATCCAG TTAAAAAACG 
551 GGTGAAAATA CATCCCAACA CAGTGATGGT GAAATATACT TCTCATTATC 
601 CCCAGCCTGG CGATGATGGA TATGAAGAAA TCAATGAAGG CTATGGGAAT 
651 TTTATGGAGG AAAATCCAAA GAAAGGTCTG CTGAGTGAAA TGAAAAAAAA 
701 AGGGAGAGCT TTCTTTGGAA CCATGGATAC CCTACCTCCA CCAACAGAAG 
751 ACCCAATGAT CAATGAGATT GGACAATTCC AGAGCTTTGC AGAAAAAAAC 
801 ATTTTTCAAT CCCGAAAAAT GTGGATAGTG CTGTTTGGAT CTGCTTTGGC 
851 TCATGGATGT GTAGCTCTTA TCACTAGGCT TGTTTCTGAT CGGTCTAAAG 
901 TTCCATCTCT AGAACTGATT TTTATCCGTT CTGTTTTTCA GGTCTTATCT 
951 GTGTTAGTTG TGTGTTACTA TCAGGAGGCC CCCTTTGGAC CCAGTGGATA 
1001 CAGATTACGA CTCTTCTTTT ATGGTGTATG CAATGTCATT TCTATCACTT 
1051 GTGCTTATAC ATCATTTTCA ATAGTTCCTC CCAGCAATGG GACCACTATG 
1101 TGGAGAGCCA CAACTACAGT CTTCAGTGCC ATTTTGGCTT TTTTACTCGT 
1151 AG AT GAG AAA ATGGCTTATG TTGACATGGC TACAGTTGTT TGCAGCATCT 
1201 TAGGTGTTTG TCTTGTCATG ATCCCAAACA TTGTTGATGA AGACAATTCT 
1251 TTGTTAAATG CCTGGAAAGA AGCCTTTGGG TACACCATGA CTGTGATGGC 
1301 TGGACTGACC ACTGCTCTCT CAATGATAGT ATACAGATCC ATCAAGGAGA 
1351 AGATCAGCAT GTGGACTGCG CTGTTTACTT TTGGTTGGAC TGGGACAATT 
1401 TGGGGAATAT CTACTATGTT TATTCTTCAA GAACCCATCA TCCCATTAGA 
1451 TGGAGAAACC TGGAGTTATC TCATTGCTAT ATGTGTCTGT TCTACTGCAG 
1501 CATTCTTAGG AGTTTATTAT GCCTTGGACA AATTCCATCC AGCTTTGGTT 
1551 AGCACAGTAC AACATTTGGA GATTGTGGTA GCTATGGTCT TGCAGCTTCT 
1601 CGTGCTGCAC ATATTTCCTA GCATCTATGA TGTTTTTGGA GGGGTAATCA 
1651 TTATGATTAG TGTTTTTGTC CTTGCTGGCT ATAAACTTTA CTGGAGGAAT 
1701 TTAAGAAGGC AGGACTACCA GGAAATACTA GACTCTCCCA TTAAATGAAT 
1751 ACCTGATTAT TATTGTCTCA TTAATGTTCA GTTATTAATA TGTATACTGC 
1801 CATTTTAATG TTTACCTATG AATGTCTTTT GTGTTATATA ACTGACAGAG 
1851 TGCTATAAAA TATATAATAT ATACAAATGC AGAAAATTTA TTCTAGTCTA 
1901 ATATATTCAA ATACAAATAT TAAATATATG AAATACGTTA AAAAAAAAAA 
1951 AAAAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 510 bp to 174 5 bp; peptide length: 412 
Category: similarity to known protein 



1 MDTSPSRKYP VKKRVKIHPN TVMVKYTSHY PQPGDDGYEE INEGYGNFME 
51 ENPKKGLLSE MKKKGRAFFG TMDTLPPPTE DPMINEIGQF QSFAEKNIFQ 
101 SRKMWIVLFG SALAHGCVAL ITRLVSDRSK VPSLELIFIR SVFQVLSVLV 
151 VCYYQEAPFG PSGYRLRLFF YGVCNVISIT CAYTSFSIVP PSNGTTMWRA 
201 TTTVFSAILA FLLVDEKMAY VDMATWCSI LGVCLVMIPN IVDEDNSLLN 
251 AWKEAFGYTM TVMAGLTTAL SMIVYRSIKE KISMWTALFT FGWTGTIWGI 
301 STMFILQEPI IPLDGETWSY LIAICVCSTA AFLGVYYALD KFHPALVSTV 
351 QHLEIVVAMV LQLLVLHIFP SIYDVFGGVI IMISVFVLAG YKLYWRNLRR 
401 QDYQEILDSP IK 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_64k24, frame 3 

TREMBLNEW:AF016712_1 gene: "AMACl " ; product: "testicular condensing 
enzyme"; Mus musculus testicular condensing enzyme (AMAC1) mRNA, 
complete cds., N = 1, Score = 191, P = 1.9e-12 

TREMBL : BMA J7 3 3_6 product: "hypothetical protein"; Bacillus megaterium 
bgaM gene, N = 1, Score - 137, P = 1.6e-06 

PIR:G71841 hypothetical protein jhpll55 - Helicobacter pylori (strain 
J99), N « 1, Score = 129, P ~ 1.3e-05 



>TREM8LNEW; AF01 67 12_1 gene: "AMACl"; product: "testicular condensing 

enzyme"; Mus musculus testicular condensing enzyme (AMACl ) mRNA, complete 
cds . 

Length » 362 

HSPs: 

Score = 191 (28.7 bits), Expect « 1.9e-12, P = 1.9e-12 
Identities = 39/105 (37%), Positives - 66/105 (62%) 

Query: 289 FTFGWTGTIWGISTMFILQEPIIPLDGETWSYLIAICVCSTAAFLGVYYALDKFHPALVS 348 

F FG G + + + F+LQ P++P D +WS ++A+ + + +F+ V YA+ K HPALV 
Sbjct: 248 FLFGLVGLMVSVPGLFVLQTPVLPQDTLSWSCWAVGLLALVSFVCVSYAVTKAHPALVC 307 

Query: 349 TVQHLEIVVAMVLQLLVLH — IFPSIYDVFGGVIIMISVFVLAGYKL 393 

V H E+WA++LQ VL+ + PS D+ G +++ S+ ++ L 
Sbjct: 308 A VL H S E VV V ALMLQ Y YVLYETVAPS — D I MG AG VVLGS I A 1 1 T AQN L 352 



Pedant information for DKFZphfbr2_64k24, frame 3 



Report for DKFZphfbr2_64k24 .3 



[LENGTH] 412 

[MWJ 46449.87 

[pi] 6.99 

[HOMOL] TREMBL: AF01 67 12_1 gene: "AMACl"; product: "testicular condensing enzyme"; Mus 

musculus testicular condensing enzyme (AMACl) mRNA, complete cds. 8e-14 

[PROSITE] MYRISTYL 6 

[PROSITE] CK2_PHOSPHO_SITE 3 

[PROSITE] PKC PHOSPHO_SITE 4 

[PROSITE] ASN~GLYCOSYLATION 1 

[KW] TRANSMEMBRANE 5 



SEQ MDTSPSRKYPVKKRVKIHPNTVMVKYTSHYPQPGDDGYEEINEGYGNFMEENPKKGLLSE 
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PRD 
MEM 

SEQ 
PRD 
MEM 



ccccccccccccceeeecccceeeeeecccccccccceeeeecccccccccccccchhhh 



MKKKGRAFFGTMDTLPPPTEDPMINEIGQFQSFAEKNIFQSRKMWIVLFGSALAHGCVAL 
hhhhcceeecccccccccccccceeeecccchhhhhhhhccceeeeeeeccccchhhhhc 



SEQ ITRLVSDRSKVPSLELI FIRSVFQVLSVLVVCYYQEAPFGPSGYRLRLFFYGVCNVISIT 
PRD chhhhhccccccccchhhhhhhhhhhheeeeeeeccccccccceeeeeeeecceeeeeee 
MEM MMMMMMMMMMMMMMMMM 



SEQ 
PRD 
MEM 



CAYTSFSIVPPSNGTTMWRATTTVFSAILAFLLVDEKMAYVDMATVVCSILGVCLVMIPN 
eccceeeeccccccceeeeeehhhhhhhhhhhhhhhhheeeeeeeeeeeeeeeeeeeecc 



SEQ IVDEDNSLLNAWKEAFGYTMTVMAGLTTALSMIVYRSIKEKISMWTALFTFGWTGTIWGI 

PRD cccccchhhhhhhhhhhheeeeeeehhhhhhhcchhhhhhhhhhhhccccccccceeeec 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ STMFILQEPIIPLDGETWSYLIAICVCSTAAFLGVYYALDKFHPALVSTVQHLEIVVAMV 

PRD ceeeeeecccccccccceeeeeccchhhhhhhhhccccccccccchhhhhhhhhhhhhhh 

MEM MMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMM 

SEQ LQLLVLHIFPSIYDVFGGVIIMISVFVLAGYKLYWRNLRRQDYQEILDSPIK 

PRD hhhhhhhhhccccccceeeeeeeeeecccccchhhhhhhhhhhhhhhccccc 

MEM MMMMMMM. . . . MMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphfbr2_64)c24 . 3 



PS00001 


193->197 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


6->9 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


101->104 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


126->129 


PKC PHOSPHO" 


"site 


PDOC00005 


PS000O5 


277->280 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


92->96 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


277->281 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


371->375 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


70->76 


MYRISTYL 


PDOC00008 


PS00008 


88->94 


MYRISTYL 




PDOC00008 


PS00008 


110->116 


MYRISTYL 




PDOC00008 


PS00008 


265->271 


MYRISTYL 




PDOC00008 


PS00008 


295->301 


MYRISTYL 




PDOC00008 


PS00008 


334->340 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphfbr2_64k24 . 3) 
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DKFZphfbr2_6al7 



group: brain derived 

DKFZphfbr2_6al7 encodes a novel 100 amino acid protein with very weak similarity to human 
finger protein zfOCl. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 1424 bp 

Poly A stretch at pos. 1405, polyadenylation signal at pos. 1389 

1 GGGACTGAGG GGGTGGGCTT ACTCCCTGGG CAGTCTTGGG GGCCAGAGCT 
51 GAGGCCAGTC CATATTACAG TGGCTGGGCT GTTTTTTTCA GTAGCCCCTA 

101 GCATTGGCTG GGATTCCTGT TCCTGGGTGC GCCTCCACCT CCCTTCTGAT 

151 GCTTCCTGGC TATGGTGGGG TGGGAACCTC AGTTTCCCCC AAAGTCTTCC 

201 CTGGATGCTG GCTTCAGGTT GAAGACCCTG GTTCTTCCAG TTCCTCACGG 

251 GTTAGGTAGG GGCTCCTGCA TCACCTTCAG AATCAGTTCC AACCCCCACT 

301 CTCCTTAGGC TTTGTGCTCT GCTCTGCCCT GCCAGGCTGC CCTTGTCCAT 

351 GTGAGTAGCA TGGGCGGGTG GTGGGGACGG CAGTGGTGAT GAAGGGGGTG 

401 CACCACAGGC CTCATGAAGC AGTTCCCACA TGGGCGTGTG GCTGGGGCGT 

4 51 GGCCACCACA GAGCACATGG CTGTGTCTAG GCGCAAGCAC TTTAGCAGTA 

501 TCTGTTTACA TGCGCAAGGA TCAAGCCGAC TACCTGTGCT GTCTACTGGG 

551 ACAGCAGTCT CCGAGCTACT CCGTACCTCC CTCTGCCAGG TCGTGGAGTT 

601 AGGCCCCAGT CCCTACTTGT CACTGGTTCC CACTGTGCTC CTAACTGTGC 

651 AGCACCTGGG AGCTCTGGCC TGGGGCTGGA GGCCCTGGTA GGAGCTGCAG 

701 TTGGAGGCCG TTCTGTGCCC AGCAGCGGTG AGCGGCTCCC ATGGGCCCTG 

751 TGTCTGCAGG GAGCCAGGGC TGCGGCACAT GTGCTGTGAA ACTGGCACCC 

801 ACCTGGCGTG CTGCTGCCGC CACTTGCTTC CTGCAGCACC TCCTACCCTG 

851 CTCCGTGTCC TCCCTCTCCC CGCGCCTGGC TCAGGAGTGC TGGAAAAGCT 

901 CACGCCTCGG CCTGGGAGCC TGGCCTCTTG ATATACCTCG AGCTTCCCCT 

951 GTGCTCCCCA GCCCCAGGAC CACTGGCCCC TTGGCCTGAG GGGCTGGGGG 
1001 CCCCACGACC TGCAGCGTCG AGTCCGGGAG AGAGCCCGGA GCGGCGTGCC 
1051 ATCTCGGCTC GGCCTTGCTG AGAGCCTCCG CCCTGGCTTT CTCCCTGTCT 
1101 GGTTTCAGTG GCTCACGTTG GTGCTACACA GCTAGAATAG AT AT AT TT AG 
1151 AGAGAGAGAT ATTTTTAAGA CAAAGCCCAC AATTAGCTGT CCTTTAACAC 
1201 CGCAGAACCC CCTCCCAGAA GAAGAGCGAT CCCTCGGACG GTCCGGGCGG 
1251 GCACCCTCAG CCGGGCTCTT TGCAGAAGCA GCACCGCTGA CTGTGGGCCC 
1301 GGCCCTCAGA TGTGTACATA TACGGCTATT TCCTATTTTA CTGTTCTTCA 
1351 GATTTAGTAC TTGTAAATAA ACACACACAT TAAGGAGAGA TTAAACATTT 
1401 TTGCCAAAAA AAAAAAAAAA AAAA 

BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 389 bp to 688 bp; peptide length: 100 
Category: putative protein 



1 MKGVHHRPHE AVPTWACGWG VATTEHMAVS RRKHFSSICL HAQGSSRLPV 
51 LSTGTAVSEL LRTSLCQVVE LGPSPYLSLV PTVLLTVQHL GALAWGWRPW 

BLAST P hits 
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Entry S70007 from database PIR: 

finger protein zfOCl - human (fragment) 

Length «= 183 

Score - 62 (21.8 bits), Expect « 0.24, Sum P(2) - 0.22 
Identities = 18/47 (38%), Positives = 24/47 (51%) 



Alert BLAST P hits for DKFZphfbr2_6al7, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_6al7, frame 2 

Report for DKFZphfbr2_6al7 .2 

[ LENGTH ) 100 

[MW] 10944.82 

(pi] 9.49 

[PROSITE] MYRISTYL 2 

[PROSITE] PKC_PHOSPHO_SITE 2 

[KW] Alpha_Beta 

SEQ MKG VHH RPH EA VPTW AC G WG VATTE H MAVS RRKH FS S I C L H AQG S S RL PV L S T GT A V S EL 
PRD cccccccccccccccccccccchhhhhhhhhhcccccceeeccccccceeecccchhhhh 

SEQ LRTSLCQWELGPSPYLSLVPTVLLTVQHLGALAWGWRPW 
PRD hhhhheeeeecccccceeecchhhhhhhhhchhhhhcccc 



Prosite for DKFZphfbr2_6al7 . 2 

PS00005 30->33 PKC PHOSPHO_SITE PDOC00005 

PS00005 45->48 PKC"PHOSPHO_SITE PDOC00005 

PS00008 20->26 MYRISTYL PDOC00008 

PS00008 54->60 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphfbr2_6al7 .2) 
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DKFZphfbr2_6b24 



group: metabolism 

DKFZphf kd2_6b24 encodes a novel 334 amino acid protein with similarity to several bacterial 
dTDP-4-dehydrorhamnose reductases (EC 1.1.1.133). 

The novel protein seems to be a human enzyme similar to dTDP-4-dehydrorhamnose reductases. EC 
1.1.1.133 catalises the reaction: dTDP-6-deoxy-L-mannose + NADP{+) <=> dTDP-4-dehydro-6-deoxy- 
L-mannose + NADPH . 

The new protein can find application in modulation of rhamnose metabolism and as a new enzyme 
for biotechnologic production processes. 



similar to dTDP-6-deoxy-L-mannose-dehydrogenases 
complete cDNA, EST hits, complete cds 

Nucleotide sugars metabolism seems to be a dehydrogenase 
localisation: region of primer A missing 

Sequenced by AGOWA 

Locus: /map="5'* 

insert length: 2054 bp 

Poly A stretch at pos. 2028, polyadenylation signal at pos. 2015 



1 GGGGGAGGCC CGCGTCGATC CTGGGTTGGA GGAGGTGGCG GCCGCTGAGG 

51 CTGCGGCGTG AAGACGGCGG GCATGGTGGG GCGGGAGAAA GAGCTCTCTA 

101 TACACTTTGT TCCCGGGAGC TGTCGGCTGG TGGAGGAGGA AGTTAACATC 

151 CCTAATAGGA GGGTTCTGGT TACTGGTGCC ACTGGGCTTC TTGGCAGAGC 

201 TGTACACAAA GAATTTCAGC AGAATAATTG GCATGCAGTT GGCTGTGGTT 

251 TCAGAAGAGC AAGACCAAAA TTTGAACAGG TTAATCTGTT GGATTCTAAT 

301 GCAGTTCATC ACATCATTCA TGATTTTCAG CCCCATGTTA TAGTACATTG 

351 TGCAGCAGAG AGAAGACCAG ATGTTGTAGA AAATCAGCCA GATGCTGCCT 

401 CTCAACTTAA TGTGGATGCT TCTGGGAATT TAGCAAAGGA AGCAGCTGCT 

4 51 GTTGGAGCAT TTCTCATCTA CATTAGCTCA GATTATGTAT TTGATGGAAC 

501 AAATCCACCT TACAGAGAGG AAGACATACC AGCTCCCCTA AATTTGTATG 

551 GCAAAACAAA ATTAGATGGA GAAAAGGCTG TCCTGGAGAA CAATCTAGGA 

601 GCTGCTGTTT TGAGGATTCC TATTCTGTAT GGGGAAGTTG AAAAGCTCGA 

651 AGAAAGTGCA GTGACTGTTA TGTTTGATAA AGTGCAGTTC AGCAACAAGT 

701 CAGCAAACAT GGATCACTGG CAGCAGAGGT TCCCCACACA TGTCAAAGAT 

751 GTGGCCACTG TGTGCCGGCA GCTAGCAGAG AAGAGAATGC TGGATCCATC 

801 AATTAAGGGA ACCTTTCACT GGTCTGGCAA TGAACAGATG ACTAAGTATG 

851 AAATGGCATG TGCAATTGCA GATGCCTTCA ACCTCCCCAG CAGTCACTTA 

901 AGACCTATTA CTGACAGCCC TGTCCTAGGA GCACAACGTC CGAGAAATGC 

951 TCAGCTTGAC TGCTCCAAAT TGGAGACCTT GGGCATTGGC CAACGAACAC 

1001 CATTTCGAAT TGGAATCAAA GAATCACTTT GGCCTTTCCT CATTGACAAG 

1051 AGATGGAGAC AAACGGTCTT TCATTAGTTT ATTTGTGTTG GGTTCTTTTT 

1101 TTTTTTAAAT GAAAAGTATA GTATGTGGCC CTTTTTAAAG AACAAAGGAA 

1151 ATAGTTTTGT ATGAGTACTT TAATTGTGAC TCTTAGGATC TTTCAGGTAA 

1201 ATGATGCTCT TGCACTAGTG AAATTGTCTA AAGAAACTAA AGGGCAGTCA 

1251 TGCCCTGTTT GCAGTAATTT TTCTTTTTAT CATTATGTTT GTCCTGGCTA 

1301 AACTTGGAGT TTGAGTATAG TAAATTATGA TCCTTAAATA TTTGAGGGTC 

1351 AGGATGAAGC AGATCTGCTG TAGACTTTTC AGATGAAATT GTTCATTCTC 

1401 GTAACCTCCA TATTTTCAGG ATTTTTGAAG CTGTTGACCA TTTCATGTTG 

1451 ATTATTTTAA ATTGTGTGGA ATAGTATAAA AATCATTGGT GTTCATTATT 

1501 TGCTTTGCCT GAGCTCAGAT CAAAATGTTT GAAGAAAGGA ACTTTATTTT 

1551 TGCAAGTTAC GTACAGTTTT TATGCTTGAG ATATTTCAAC ATGTTATGTA 

1601 TATTGGAACT TCTACAGCTT GATGCCTCCT GCTTTTATAG CAGTTTATGG 

1651 GGAGCACTTG AAAGAGCGTG TGTACATGTA TTTTTTTTCT AGGCAAACAT 

1701 TGAATGCAAA CGTGTATTTT TTTAATATAA ATATATAACT GTCCTTTTCA 

1751 TCCCATGTTG CCGCTAAGTG ATATTTCATA TGTGTGGTTA TACT CAT A AT 

1801 AATGGGCCTT GTAAGTCTTT TCACCATTCA TGAATAATAA TAAATATGTA 

1851 CTGCTGGCAT GTAATGCTTA GTTTTCTTGT ATTTACTTCT TTTTTTTAAA 

1901 TGTAAGGACC AAACTTCTAA ACTAATTGTT CTTTTGTTGC TTTAATTTTT 

1951 AAAAATTACA TTCTTCTGAT GTAACATGTG AT AC AT AC AA AAGAATATAG 

2001 TTTAATATGT ATTGAAATAA AACACAATAA AATTAAAAAA AAAAAAAAAA 

2051 AAAA 



BLAST Results 



Entry G37115 from database EMBL: 
SHGC-56899 Human Homo sapiens STS genomic. 
Score - 446, P - 4.6e-14, identities - 90/91 
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Medline entries 



99109950: 

The metabolism of 6-deoxyhexoses in bacterial and animal 
cells. 



Peptide information for frame 1 



ORF from 73 bp to 107 4 bp; peptide length: 334 
Category: similarity to known protein 



1 MVGREKELSI HFVPGSCRLV EEEVNIPNRR VLVTGATGLL GRAVHKEFQQ 
51 NNWHAVGCGF RRARPKFEQV NLLDSNAVHH IIHDFQPHVI VHCAAERRPD 
101 VVENQPDAAS QLNVDASGNL AKEAAAVGAF LIYISSDYVF DGTNPPYREE 
151 DIPAPLNLYG KTKLDGEKAV LENNLGAAVL RIPILYGEVE KLEESAVTVM 
201 FDKVQFSNKS ANMDHWQQRF PTHVKDVATV CRQLAEKRML DPSIKGTFHW 
251 SGNEQMTKYE MACAIADAFN LPSSHLRPIT DSPVLGAQRP RNAQLDCSKL 
301 ETLGIGQRTP FRIGIKESLW PFLIDKRWRQ TVFH 

B LAS TP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_6b24, frame 1 

PIR:T00104 probable dTDP-4-dehydrorhamnose reductase (EC 1.1.1.133) - 
Actinobacillus actinomycetemcomitans, N » 1, Score = 293, P - 6.4e-26 

TREMBL:SSU5 1197^21 gene: "rhsD"; product: 

"dTDP-6-deoxy-L-mannose-dehydrogenase"; Sphingomonas S88 sphingan 
polysaccharide synthesis (spsG), (spsS), (spsR), glycosyl transferase 
(spsQ), (spsl), glycosyl transferase (spsK), glycosyl transferase 
(spsL), (spsJ) , (spsF), (spsD), (spsC), (spsE) , Urf 32, Urf 26, 
ATP-binding cassette trans>., N 3 1, Score = 291, P = le-25 

SWISSPROT:RFBD_RHISN PROBABLE DTDP-4 -DEH YDRORHAMNOSE REDUCTASE (EC 
1.1.1.133} ( DTDP-4 -KETO- L-RHAMNOSE REDUCTASE) (DTDP-6-DEOXY-L-MANNOSE 
DEHYDROGENASE) (DTDP-L- RHAMNOSE SYNTHETASE) . , N * 1, Score = 283, P = 
7.4e-25 



>PIR:T00104 probable dTDP-4-dehydrorhamnose reductase (EC 1.1.1.133) - 
Actinobacillus actinomycetemcomitans 
Length « 294 

HSPs: 

Score - 293 (44.0 bits), Expect = 6.4e-26, P = 6.4e-26 
Identities - 89/276 (32%), Positives = 151/276 (54%) 

Query: 30 RVLVTGATGLLGRAVHKEFQQNNWHAVGCGFRRARPKFEQVNLLDSNAVHHIIHDFQPHV 89 

R+L+TGA G LGR++ K N + V F ++++ + + V II F+P+V 

Sbjct: 3 RLLITGAGGQLGRSLAKLLVDNGRYEV LALDFSELDITNKDMVFS 1 1 DS FK PNV 56 

Query: 90 I VHCAAERRPD WENQPDAASQLNVDASGNLAKEAAAVGAFLIYISSDYVFDG-TNPPYR 148 

I++ AA D E + +A +NV LA+ A + ++++S+DYVFDG + Y+ 

Sbjct: 57 IINAAAYTSVDQAELEVSSAYSVNVRGVQYLAEAAIRHNSAILHVSTDYVFDGYKSGKYK 116 

Query: 149 EEDIPAPLNLYGKTKLDGEKAVLENNLGAAVLRIPILYGEVEKLEESAVTVMFDKVQFSN 208 

E DI PL +YGK+K +GE+ +L + + +LR +GE + V M ++ + 

Sbjct: 117 ETDIIHPLCVYGKSKAEGERLLLTLSPKSIILRTSWTFGEYGN NFVKTML-RLAKNR 172 

Query: 209 K S ANM DHWQQR FPT H VK DV AT VC RQLAE K RML D P S I K - GT FH WSGNEQMT K Y EMAC A IAD 267 

+ Q PT+ D+A+V Q+AEK ++ ++K G +H++G ++ Y+ A AI D 
Sbjct: 173 DILGWADQIGGPTYSGDIASVLIQIAEKIIVGETVKYGIYHFTGEPCVSWYDFAIAIFD 232 

Query: 268 AF NLPSSHLRPITDSPVLGAQRPRNAQLDCSKLE-TLGI 305 

N+P + D P L A+RP N+ LD +K++ GI 

Sbjct: 233 EAVAQKVLENVPLVNAITTADYPTL-AKRPANSCLDLTKIQQAFGI 277 
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Pedant information for DKFZphfbr2_6b24 , frame 1 



Report for DKFZphfbr2_6b24 . 1 



[LENGTH] 334 

[MW] 3*7551.98 

[plj 6.90 

[HOMOL] PIR:T00104 probable dTDP-4-dehydrorhamnose reductase (EC 1.1.1.133} - 
Actinobacillus actinomycetemcomitans 6e-25 

(FUNCAT] 01.06.01 lipid, fatty-acid and sterol biosynthesis [S. cerevisiae, YGLOOlc] 
6e-04 

[EC J 1.1.1.133 dTDP-4-dehydrorhamnose reductase 2e-16 

[PIRKW] lipopolysaccharide biosynthesis 2e-16 

tPIRKW] NADP 2e-16 

[PIRKW] oxidoreductase 2e-16 

[PIRKW] streptomycin biosynthesis le-19 

[SUPFAM] dTDP-dihydrostreptose synthase le-20 

[PROSITEJ MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 4 

[PROSITEJ PKC_PHOSPHO_SITE 3 

I P ROS ITEJ ASN_GL YCOS YLAT I ON 1 

[KW] Alpha_Beta 



SEQ MVGREKELS IHFVPGSCRLVEEEVNI PNRRVLVTGATGLLGRAVHKEFQQNNWHAVGCGF 

PRD cccccceeeccccccceeeeecccccccceeeeeccccchhhhhhhhhhhccceeeeecc 

SEQ RRARPKFEQVNLLDSNAVHHI IHDFQPHVI VHCAAERRPDVVENQPDAASQLNVDASGNL 

PRD cccccccccccccchhhhhhhhhhhccceeeehhhhhhhhhhhhhhhhhhhhhhccchhh 

SEQ AKEAAAVGAFLIYISSDYVFDGTNPPYREEDI PAPLNLYGKTKLDGEKAVLENNLGAAVL 

PRD hhhhhhhhheeeeeeccccccccccccccccccccccccchhhhhhhhhccccccceeee 

SEQ RIPILYGEVEKLEESAVTVMFDKVQFSNKSANMDHWQQRFPTHVKDVATVCRQLAEKRML 

PRD eeeeeecccccccchhhhhhhhhhhhhccceeeccccccccccchhhhhhhhhhhhhhhh 

SEQ DPSIKGTFHWSGNEQMTKYEMACAIADAFNLPSSHLRPITDSPVLGAQRPRNAQLDCSKL 

PRD cccccceeeeccccccchhhhhhhhhhhhhcccccccccccccccccccccccchhhhhh 

SEQ ETLGIGQRTPFRIGIKESLWPFLIDKRWRQTVFH 

PRD hhhhccccchhhhhhhhhhhhhhhhhhhhhcccc 



Prosite for DKF2phfbr2_6b24 . 1 



PS00001 


208- 


->212 


ASN 


GLYCOS YLAT ION 


PDOC00001 


PS00005 


16->19 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


207- 


->210 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


243 


->246 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00006 


162- 


->166 


CK2~ 


"PHOSPHO SITE 


PDOC0000 6 


PS00006 


251- 


->255 


CK2~ 


PHOSPHO SITE 


PDOC00006 


PS00006 


257- 


->261 


CK2" 


"PHOSPHO SITE 


PDOC0000 6 


PS00006 


298- 


->302 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00008 


314- 


->320 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZph£br2_6b24 . 1) 



298 



WO 01/12659 



PCT/IB00/01496 



DKF2phfbr2_6i20 



group: brain derived 

DKFZphfbr2_6i20 encodes a novel 296 amino acid protein with similarity to ribosomal protein 
LIS precursor of S. cerevisiae mitochondria. 

No informative BLAST results; No predictive prosite, pfara or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

similarity to ribosomal protein L15 precursor, mitochondrial 

complete cDNA, complete cds, EST hits 
potential miochondrial L15 ribosomal protein 

Sequenced by AGOWA 

Locus: /map=*'377.5 cR from top of Chr8 linkage group" 
Insert length: 1122 bp 

Poly A stretch at pos. 1099, polyadenylation signal at pos . 1071 

1 GGGGGCCCTT GAAAGTTCTT GGATCTGCGG GTTATGGCCG GTCCCTTGCA 
51 GGGCGGTGGG GCCCGGGCCC TGGACCTACT CCGGGGCCTG CCGCGTGTGA 

101 GCCTGGCCAA CTTAAAGCCG AATCCCGGCT CCAAGAAACC GGAGAGAAGA 

151 CCAAGAGGTC GGAGAAGAGG TAGAAAATGT GGCAGAGGCC ATAAAGGAGA 

201 AAGGCAAAGA GGAACCCGGC CCCGCTTGGG CTTTGAGGGA GGCCAGACTC 

251 CATTTTACAT CCGAATCCCA AAATACGGGT TTAACGAAGG ACATAGTTTC 

301 AGACGCCAGT ATAAGCCTAT GAGTCTCAAT AGACTGCAGT ATCTTATTGA 

351 TTTGGGTCGT GTTGATCCTA GTCAACCTAT TGACTTAACC CAGCTTGTCA 
' 401 ATGGGAGAGG TGTGACCATC CAGCCACTTA AAAGGGATTA TGATGTCCAG 

451 CTGGTTGAGG AGGGTGCTGA CACCTTTACG GCAAAAGTTA ATATTGAAGT 

501 ACAGTTGGCT TCAGAACTAG CTATTGCTGC CATTGAAAAA AATGGTGGTG 

551 TTGTTACTAC AGCCTTCTAT GATCCAAGAA GTCTGGACAT TGTATGCAAA 

601 CCTGTTCCAT TCTTTCTTCG TGGACAACCC ATTCCAAAAA GAATGCTTCC 

651 ACCAGAAGAA CTGGTACCAT ATTACACTGA TGCAAAGAAC CGTGGGTACC 

701 TGGCGGATCC TGCCAAATTT CCTGAAGCAC GACTTGAACT CGCCAGGAAG 

751 TATGGTTATA TCTTACCTGA TATCACTAAA GATGAACTCT TCAAAATGCT 

801 CTGTACTAGG AAGGATCCAA GGCAGATTTT CTTTGGTCTT GCTCCAGGAT 

851 GGGTGGTGAA TATGGCCGAT AAGAAAATCC TAAAACCTAC AGATGAAAAT 

901 CTCCTTAAGT ATTATACCTC ATGAATTCCC GTCCAAGGAA GCAGAGTTGT 

951 TAAAGAGTAC TGGAATAGGG GCTGAAGGAT CTATATTCCC TTATTGCATT 
1001 TTCCTTATGT ATAATTTTCC AGATGGTGAT GTTACTTTTC AGTGTACTCA 
1051 TATGTCTCAT TTTCATCTAA AATTAAATGG CAGGAAACAA GGACTGCATA 
1101 GAGAAAAAAA AAAAAAAAAA AA 

BLAST Results 



Entry HS500354 from database EMBL: 
human STS WI-12392. 
Length = 426 
Minus Strand HSPs: 

Score = 1791 (268.7 bits), Expect = l.le-74, P = l.le-74 
identities = 375/384 (97%) 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 34 bp to 921 bp; peptide length: 296 
Category: strong similarity to known protein 



1 MAGPLQGGGA RALDLLRGLP RVSLANLKPN PGSKKPERRP RGRRRGRKCG 



\ 
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51 RGHKGERQRG TRPRLGFEGG QTPFYIRIPK YGFNEGHSFR RQYKPMSLNR 

101 LQYLIDLGRV DPSQPIDLTQ LVNGRGVTIQ PLKRDYDVQL VEEGADTFTA 

151 KVNIEVQLAS ELAIAAIEKN GGVVTTAFYD PRSLDIVCKP VPFFLRGQPI 

201 PKRMLPPEEL VPYYTDAKNR GYLADPAKFP EARL E LARKY GYILPDITKD 

251 ELFKMLCTRK DPRQ1 FFGLA PGWVVNMADK KILKPTDENL LKYYTS 

BLASTP hits 

Entry S63258 from database PIR: 

ribosomal protein LIS precursor, mitochondrial - yeast (Saccharomyces 

cerevisiae) 

Length = 322 

Score = 259 (91.2 bits), Expect = 2.0e-22, P » 2.0e-22 
Identities - 71/200 (35%), Positives «= 106/200 (53%) 

Entry H70161 from database PIR: 

ribosomal protein LIS (rplO) - Lyme disease spirochete 
Length =» 145 

Score - 173 (60.9 bits), Expect = 4.8e-13, P = 4.8e-13 
Identities ■= 45/140 (32%), Positives = 73/140 (52%) 



Alert BLASTP hits for DKFZphfbr2_6i20, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_6i20, frame 1 



Report for DKFZphfbr2_6i20 . 1 



[LENGTH) 296 

[MW] 33495.98 

[pi] 9.98 

[HOMOL] TREMBL : AF0 67 2 1 2_1 gene: "F37F2.1"; Caenorhabditis elegans cosmid F37F2. le 

[FUNCAT] 05.01 ribosomal proteins [S. cerevisiae, YNL284c] 7e-15 

[FUNCAT] 30.16 mitochondrial organization (S. cerevisiae, YNL284c] 7e-15 

[FUNCAT) j mrna translation and ribosorae biogenesis [M. genitalium, MG169] le-06 

[BLOCKS] BL00475D 

[BLOCKS) BL00475B Ribosomal protein L15 proteins 

[PIRKW] ribosome 2e-13 

[PIRKW] mitochondrion 2e-13 

[PIRKW] protein biosynthesis 2e-13 

[SUPFAM] Escherichia coli ribosomal protein LIS 4e-06 

[PROSITE] MYRISTYL 3 

[PROSITE] AMI DAT I ON 2 

[PROSITE] CK2_PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 4 

[KW] Alpha__Beta 

[KWJ LOW_COMPLEXITY 12.50 % 

SEQ MAGPLQGGGARALDLLRGLPRVSLANLKPNPGSKKPERRPRGRRRGRKCGRGHKGERQRG 
SEG xxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxx . . . 



PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ TRPRLGFEGGQTPFYIRIPKYGFNEGHSFRRQYKPMSLNRLQYLIDLGRVDPSQPIDLTQ 

SEG 

PRD ccccccccccccceeeeeccccccccccccccccccchhhhhhhhhccccccccccccee 

SEQ LVNGRGVTIQPLKRDYDVQLVEEGADTFTAKVNIEVQLASELAIAAIEKNGGVVTTAFYD 

SEG 

PRD ecccceeeeccccccceeeeeeccccccchhhhhhhhhhhhhhhhhhhhccceeeeeecc 

SEQ PRSLDI VCKPVPFFLRGQPI PKRMLPPEELVPYYTDAKNRGYLADPAKFPEARLELARKY 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhh 

SEQ GYILPDITKDELFKMLCTRKDPRQIFFGLAPGWVVNMADKKILKPTDENLLKYYTS 

SEG 

PRD cccccccchhhhhhhhhcccccceeeeeccccceeeeccceeecccchhhhhcccc 



Prosite for DKFZphfbr2_6i20 . 1 

PS00005 33->36 PKC_PHOSPHO_SITE PDOC00005 

PS00005 88->91 PKC PHOSPHO SITE PDOC00005 
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PS00005 


149->152 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


258->261 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


248->252 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


258->262 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


8->14 


MYRISTYL 




PDOC00008 


PS00008 


171->177 


MYRISTYL 




PDOC00008 


PS00008 


268->274 


MYRISTYL 




PDOC00008 


PS00009 


41->45 


AMI DAT I ON 




PDOC00009 


PS00009 


45->49 


AMI DAT I ON 




PDOC00009 



(No Pfam data available for DKFZphfbr2_6i20.1) 
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DKFZphfbr2_6ol7 



group: nucleic acid management 

DKFZphfbr2_6ol7 encodes a novel 455 amino acid protein with strong similarity to DEAD-box ATP- 
dependent RNA helicases YHR065c and T26G10.1. 

The S. cerevisiae protein YHR065c is required for maturation of the 35S RNA primary 
transcript . 

The new protein can find application in modulating rRNA maturation. 

strong similar to RNA helicases 

complete cDNA, complete cds, EST hits 

probable start at Bp 27 ma tens kozak consensus ANNatgG 

involved in maturation of r-RNA ?? 

YHR065c/Rrp3p is involved in maturation of the 35S primary transcript 
Drslp cold-sensitive mutation has slow 27S to 25S pre-rRNA 
conversion and is deficient in 60S ribosomal subunits 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 1840 bp 

Poly A stretch at pos. 1815, polyadenylation signal at pos. 1793 

1 GGGGACTTCC GGAGACCTCA CACAAGATGG CGGCACCCGA GGAACACGAT 
51 TCTCCGACCG AAGCGTCCCA GCCGATTGTG GAAGAGGAGG AAACTAAAAC 

101 ATTTAAAGAC CTGGGTGTGA CAGATGTGTT GTGTGAAGCT TGTGACCAGT 

151 TGGGATGGAC AAAACCCACC AAGATTCAGA TTGAAGCTAT TCCTTTGGCC 

201 TTACAAGGTC GTGATATCAT TGGGCTTGCA GAAACTGGCT CTGGAAAGAC 

251 AGGCGCCTTT GCTTTGCCCA TTCTAAACGC ACTGCTGGAG ACCCCGCAGC 

301 GTTTGTTTGC CCTAGTTCTT ACCCCGACTC GGGAGCTGGC CTTTCAGATC 

351 TCAGAGCAGT TTGAAGCCCT GGGGTCCTCT ATTGGAGTGC AGAGTGCTGT 

401 GATTGTAGGT GGAATTGATT CAATGTCTCA ATCTTTGGCC CTTGCAAAAA 

451 AACCACATAT AATAATAGCA ACTCCTGGTC GACTGATTGA CCACTTGGAA 

501 AATACGAAAG GTTTCAACTT GAGAGCTCTC AAATACTTGG TCATGGATGA 

551 AGCCGACCGA ATACTGAATA TGGATTTTGA GACAGAGGTT GACAAGATCC 

601 TCAAAGTGAT TCCTCGAGAT CGGAAAACAT TCCTCTTCTC TGCCACCATG 

651 ACCAAGAAGG TTCAAAAACT TCAGCGAGCA GCTCTGAAGA ATCCTGTGAA 

701 ATGTGCCGTT TCCTCTAAAT ACCAGACAGT TGAAAAATTA CAGCAATATT 

751 ATATTTTTAT TCCCTCTAAA TTCAAGGATA CCTACCTGGT TTATATTCTA 

801 AATGAATTGG CTGGAAACTC CTTTATGATA TTCTGCAGCA CCTGTAATAA 

851 TACCCAGAGA ACAGCTTTGC TACTGCGAAA TCTTGGCTTC ACTGCCATCC 

901 CCCTCCATGG ACAAATGAGT CAGAGTAAGC GCCTAGGATC CCTTAATAAG 

951 TTTAAGGCCA AGGCCCGTTC CATTCTTCTA GCAACTGACG TTGCCAGCCG 
1001 AGGTTTGGAC ATACCTCATG TAGATGTGGT TGTCAACTTT GACATTCCTA 
1051 CCCATTCCAA GGATTACATC CATCGAGTAG GTCGAACAGC TAGAGCTGGG 
1101 CGCTCCGGAA AGGCTATTAC TTTTGTCACA CAGTATGATG TGGAACTCTT 
1151 CCAGCGCATA GAACACTTAA TTGGGAAGAA ACTACCAGGT TTTCCAACAC 
1201 AGGATGATGA GGTTATGATG CTGACAGAAC GCGTCGCTGA AGCCCAAAGG 
1251 TTTGCCCGAA TGGAGTTAAG GGAGCATGGA GAAAAGAAGA AACGCTCGCG 
1301 AGAGGATGCT GGAGATAATG ATGACACAGA GGGTGCTATT GGTGTCAGGA 
1351 ACAAGGTGGC TGGAGGAAAA ATGAAGAAGC GGAAAGGCCG TTAATCACTT 
1401 TTATGAAGGC TCGAGTTCTG CTGTTCTGTA AAAGAAAATT GGAGAATGAA 
1451 ACCTGCTCCA ACAGAGATCA TGAGACTGAA ATTGGTCAGA ATTGTGTCCA 
1501 GAATGTGCTC AGCTAATTCA GTATTCTTCC CCATTCTGGG TTGGAGTTTA 
1551 CTGCAGAGTA ATTCTTACAG TGCTGATGTC AAGACTGTTA CTGTTCTTCG 
1601 ACTTTGATTC CTTGCTCATG ACATGAGTAG GGTGTGCTCT TCTGTCACTT 
1651 CACACAGACC TTTTGCCTTT TTTAGCTGCA AGTCAAGGAC TAGGTTGATG 
1701 ATGCCCATGA CCTGTAATTG TAAAGAAGCT TGGACATCTG CAAATGATAT 
17 51 TTAAACCATC TTGGCTTGTG CTTTATTCAA ACTAATGTGA AACAATAAAT 
1801 TTAAATATTA TTTTTAAAAG AAAAAAAAAA AAAAAAAAAA 

BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 
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Peptide information for frame 3 



ORF from 27 bp to 1391 bp; peptide length: 4 55 
Category: strong similarity to known protein 



1 MAAPEEHDSP TEASQPIVEE EETKTFKDLG VTDVLCEACD QLGWTKPTKI 
51 QIEAIPLALQ GRDIIGLAET GSGKTGAFAL PILNALLETP QRLFALVLTP 
101 TRELAFQISE QFEALGSSIG VQSAVIVGGI DSMSQSLALA KKPHIIIATP 
151 GRLIDHLENT KGFNLRALKY LVMDEADRIL NMDFETEVDK ILKVIPRDRK 
201 TFLFSATMTK KVQKLQRAAL KNPVKCAVSS KYQTVEKLQQ YYIFIPSKFK 
251 DTYLVYILNE LAGNSFMIFC STCNNTQRTA LLLRNLGFTA IPLHGQMSQS 
301 KRLGSLNKFK AKARSILLAT DVASRGLDIP HVDVVVNFDI PTHSKDYIHR 
351 VGRTARAGRS GKAITFVTQY DVELFQRIEH LIGKKLPGFP TQDDEVMMLT 
401 ERVAEAQRFA RMELREHGEK KKRSREDAGD NDDTEGAIGV RNKVAGGKMK 
451 KRKGR 



BLAST P hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphfbr2_6ol7 / frame 3 

PIR:S40731 ATP-dependent RNA helicase homolog T26G10.1 - Caenorhabditis 
elegans, N » 1, Score = 1497, P - 1.6e-153 

PIR:S46713 hypothetical protein YHR065c - yeast (Saccharomyces 
cerevisiae) , N = 1, Score = 1154, P = 3.6e-117 

TREMBL:ATH0104 62_1 gene: "RH10"; product: "RNA helicase"; Arabidopsis 
thaliana mRNA for DEAD box RNA helicase, RH10, N - 1, Score = 1122, P = 
8.9e-114 



TREMBL:AC002985_2 product: "R27090_2"; Human DNA from chromosome 

19-specific cosmid R27090, genomic sequence, complete sequence., N «= 1, 
Score = 950, P = 1.5e-95 

>PIR:S40731 ATP-dependent RNA helicase homolog T26G10.1 - Caenorhabditis 
elegans 

Length = 489 



HSPs: 



Score = 1497 (224.6 bits), Expect = 1.6e-153, P = 1.6e-153 
Identities - 283/442 (64%), Positives = 364/442 (82%) 



Query: 


19 


EEEETKTFKDLGVTDVLCEACDQLGWTKPTKIQIEAIPLALQGRDIIGLAETGSGKTGAF 


78 






E+ + K+F +LGV+ LC+AC +LGW KP+KIQ A+P ALQG+D+IGLAETGSGKTGAF 




Sbjct: 


39 


EDVKEKSFAELGVSQPLCDACQRLGWMKPSKIQQAALPHALQGKDVIGLAETGSGKTGAF 


98 


Query: 


79 


ALPILNALLETPQRLFALVLTPTRELAFQISEQFEALGSSIGVQSAVIVGGIDSMSQSLA 


138 






A+P+L +LL+ PQ F LVLT PTRELAFQI +QFEALGS IG+ +AVIVGG+D +Q++A 




Sbjct: 


99 


AIPVLQSLLDH PQA F FC LV LT PT REL A FQI GQQ FE ALG S G I GL I AAV I VGG V DMAAQ AMA 


158 


Query: 


139 


LAKKPHIIIATPGRLIDHLENTKGFNLRALKYLVMDEADRILNMDFETEVDKILKVIPRD 


198 






LA++PHII+ATPGRL+DHLENTKGFNL+ALK+L+MDEADRILNMDFE E+DKILKVTPR+ 




Sbjct: 


159 


LARRPHIIVATPGRLVDHLENTKGFNLKALKFLIMDEADRILNMDFEVELDKILKVIPRE 


218 


Query: 


199 


RKTFLFSATMTKKVQKLQRAALKNPVKCAVSSKYQTVEKLQQYYIFIPSKFKDTYLVYIL 


258 






R+T+LFSATMTKKV KL+RA+L++P + +VSS+Y+TV+ L+Q+YIF+P+K+K+TYLVY+L 




Sbjct: 


219 


RRTYLFSATMTKKVSKLERASLRDPARVSVSSRYKTVDNLKQHYIFVPNKYKETYLVYLL 


278 


Query: 


259 


NELAGNSFMI FCSTCNNTQRTALLLRNLGFTAI PLHGQMSQSKRLGSLNKFKAKARSILL 


318 






NE AGNS ++FC+TC T + A++LR LG A+PLHGQMSQ KRLGSLNKFK+KAR IL+ 




Sbjct: 


279 


NEHAGNSAIVFCATCATTMQIAVMLRQLGMQAVPLHGQMSQEKRLGSLNKFKSKAREILV 


338 


Query: 


319 


ATDVASRGLDI PHVDWVNFDI PTHSKDYIHRVGRTARAGRSGKAITFVTQYDVELFQRI 


378 






TDVA+RGLDIPHVD+V+N+D+P+ SKDY+HRVGRTARAGRSG AIT VTQYDVE +Q+I 




Sbjct: 


339 


CTDVAARGLDIPHVDMVINYDMPSQSKDYVHRVGRTARAGRSGIAITVVTQYDVEAYQKI 


398 


Query: 


379 


EHLIGKKLPGFPTQDDEVMMLTERVAEAQRFARMELREHGEKKK RSREDAGDNDD 


433 






E +GKKL + ++EVM+L ER EA AR+E++E EKKK R +D GD ++ 




Sbjct: 


399 


EANLGKKLDEYKCVENEVMVLVERTQEATENARI EMKEMDEKKKSGKKRRQNDDFGDTEE 


458 



Query: 434 TEGAIGVRNKVAGGKMKKRKGR 455 
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+ G + K GG+ GR 
Sbjct: 459 SGG RFKMG I K SMGG RGG S GGG R 480 



Pedant information for DKFZphfbr2_6ol7, frame 3 



Report for DKFZphfbr2_6ol7 . 3 



[LENGTH] 


455 




[MW] 


50646.80 




fpl) 


9.18 




t HOMOL ] 


PIR:S40731 ATP-dependent RNA helicase homolog T26G10.1 - Caenorhabditis elegans 


le-167 






[FUNCAT] 


04.01.04 rrna processing [S. cerevisiae, YHR065c) le-127 




I FUNCAT J 


30.10 nuclear organization (S. cerevisiae, YHR065c] le-127 




[ FUNCAT J 


04.99 other transcription activities [S. cerevisiae, YHR169w] 


2e-79 


[FUNCAT] 


06.10 assembly of protein complexes [S. cerevisiae, YLL008w] 


le-71 


[FUNCAT] 


04,05.01.07 chromatin modification [S. cerevisiae, YMR290c] 


4e-66 


[FUNCAT] 


j mrna translation and ribosome biogenesis [H. influenzae, HI0231 RNA] le-63 


[FUNCAT] 


09.01 biogenesis of cell wall [S. cerevisiae, YJL033w) 


le-58 


[FUNCAT] 


04.05.03 mrna processing [splicing) (S. cerevisiae, YDL084w] 


le-55 


[ FUNCAT J 


05.04 translation (initiation, elongation and termination) [S 


. cerevisiae. 


YOR204w] 5e- 


-55 




[ FUNCAT] 


30.03 organization of cytoplasm [S. cerevisiae, YOR204w] 


5e-55 


[ FUNCAT ] 


1 genome replication, transcription, recombination and repair 


CH. 


influenzae, 


HI0892] 9e-48 




[FUNCAT] 


98 classification not yet clear-cut [S. cerevisiae, YLR276c] 


2e-45 


[ FUNCAT ] 


30.16 mitochondrial organization [S. cerevisiae, YDR194C] 


4e-42 


[FUNCAT] 


99 unclassified proteins [S. cerevisiae, YGL064c] 7e-16 




[ FUNCAT ] 


03.19 recombination and dna repair [S. cerevisiae, YMR190c] 


7e-12 


[FUNCAT] 


11.10 cell death [S. cerevisiae, YMR190c] 7e-12 




[FUNCAT] 


r general function prediction [M. jannaschii, MJ1401] 5e-06 


[BLOCKS] 


BL00175B Phosphoglycerate mutase family phosphohistidine proteins 


[BLOCKS] 


BL00039D DEAD-box subfamily ATP-dependent helicases proteins 




[BLOCKS] 


BL00039C DEAD-box subfamily ATP-dependent helicases proteins 




[BLOCKS] 


BL00039B DEAD-box subfamily ATP-dependent helicases proteins 




[BLOCKS] 


BL00039A DEAD-box subfamily ATP-dependent helicases proteins 




[PIRKW] 


nucleus 4e-60 




[PIRKW] 


RNA binding 7e-69 




[PIRKW] 


DEAD box 7e-69 




[PIRKW] 


transmembrane protein 9e-41 




[ PIRKW] 


DNA binding 3e-55 




[PIRKW] 


recF recombination pathway 3e-ll 




[ PI RKW ] 


ATP le-126 




[PIRKW] 


purine nucleotide binding 7e-69 




[PIRKW] 


P-loop le-126 




[ PI RKW ] 


hydrolase le-55 




[ PIRKW] 


protein biosynthesis 7e-69 




[PIRKW] 


ATP binding 3e-61 




[SUPFAM] 


ATP-dependent RNA helicase eIF-4A 8e-06 




[SUPFAM] 


WW repeat homology 4e-58 




[SUPFAM] 


translation initiation factor eIF-4A 7e-69 




[SUPFAM] 


DEAD/H box helicase homology le-126 




[SUPFAM] 


recQ helicase homology 5e-12 




[ SUPFAM) 


ATP-dependent RNA helicase homology 8e-06 




[SUPFAM] 


unassigned DEAD/H box helicases le-126 




[SUPFAM] 


ATP-dependent RNA helicase DBP1 4e-60 




[SUPFAM] 


ATP-dependent RNA helicase DHH1 le-58 




[SUPFAM] 


recQ protein 3e-ll 




[SUPFAM] 


tobacco ATP-dependent RNA helicase DB10 4e-58 




[SUPFAM] 


Bloom's syndrome helicase 5e-12 




[PROSITE] 


DEAD ATP HELICASE 1 




[PROSITE] 


ATP GTP A 1 




[PROSITE] 


MYRISTYL 5 




[PROSITE] 


AMIDATION 1 




[PROSITE] 


CAMP PHOSPHO SITE 1 




[PROSITE] 


CK2 PHOSPHO SITE 6 




[PROSITE] 


PKC PHOSPHO SITE 9 




[PROSITE] 


AS N_G L Y COS Y LAT I ON 1 




[PFAM] 


Helicases conserved C-terminal domain 




[PFAM] 


DEAD and DEAH box helicases 




[KW] 


Alpha_Beta 





SEQ MAAPEEHDSPTEASQPIVEEEETKTFKDLGVTDVLCEACDQLGWTKPTKIQIEAIPLALQ 

PRD cccccccccccccccchhhhhhhhhhhccccchhhhhhhhhhcccccccccccccccccc 

SEQ GRDIIGLAETGSGKTGAFALPILNALLETPQRLFALVLTPTRELAFQISEQFEALGSSIG 

PRD ccceeeeeccccccceeehhhhhhhhcccccceeeeeeccchhhhhhhhhhhhhhhhhcc 
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SEQ VQSAVI VGGI DSMSQSLALAKKPHI II ATPGRLI DHLENTKGFNLRALKYLVMDEADRIL 

PRD eeeeeeeccchhhhhhhhhhccceeeeeccccccccccccccccccccceeehhhhhhhh 

SEQ NMDFETEVDKILKVI PRDRKTFLFSATMTKKVQKLQRAALKNPVKCAVSSKYQTVEKLQQ 

PRD hhcchhhhhhhhhhcccchhhhhhhhccchhhhhhhhhhhccceeeeeecccccchhhhh 

SEQ YYIFIPSKFKDTYLVYILNELAGNSFMIFCSTCNNTQRTALLLRNLGFTAIPLHGQMSQS 

PRD hhhhhhhhhhhhhhhhhhhhhccceeeeeeecchhhhhhhhhhhhcccceeeccccchhh 

SEQ KRLGSLNKFKAKARSILLATDVASRGLDIPHVDVWNFDIPTHSKDYIHRVGRTARAGRS 

PRD hhhhhhhhhhhhhhhcchhhhhhhhcccccceeeeeecccccccceeeeecccccccccc 

SEQ GKAITFVTQYDVELFQRIEHLIGKKLPGFPTQDDEVMMLTERVAEAQRFARMELREHGEK 

PRD cceeeeeecchhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ K KRS RED AG DN D DT EG A I GVRN K V AGG KMK K RKG R 

PRD hhhhccccccccccccccccccccccccccccccc 



Prosite for DKFZphfbr2_6ol7 . 3 



PS00001 
PS00004 
PS00005 
PS00005 
PS00005 
PS00OO5 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00017 
PS00039 



274->278 
421->425 
25->28 
72->75 
209->212 
229->232 
276->279 
300->303 
354->357 
360->363 
400->403 
9->13 
25->29 
186->190 
368->372 
391->395 
424->428 
66->72 
71->77 
116->122 
120->126 
128->134 
382->386 
68->76 
172->181 



ASN_GLYCOSYLATION 

CAMP_PHOS PHO_S I TE 

PKC_PHOS PHO_S I TE 

PKC PHOSPHO_SITE 

PKC~PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMIDATION 

ATP_GTP_A 

DEAD ATP HELICASE 



PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOCQ0006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00017 
PDOC00039 



Pfam for DKFZphfbr2_6ol7 . 3 



HMM_NAME DEAD and DEAH box helicases 

HMM *gLpPWILRnIyeMGFEkPTPIQQqAIPiILeGRDVMACAQTGSGKTAAF 
G ++ ++++++++G++KPT+IQ +AIP++L+GRD+++ A TGSGKT+AF 
Query 30 GVTDVLCEACDQLGWTKPTKIQIEAIPLALQGRDIIGLAETGSGKTGAF 78 

HMM llPMLQHIDwdPWpqpPQdPrALILAPTRELAMQIQEEcRkFgkHMnglR 
++P+L ++++P + ++AL+L+PTRELA QI+E+++++G++++ ++ 

Query 7 9 ALPILNALLETP QR-LFALVLTPTRELAFQISEQFEALGSSIG-VQ 122 

HMM ImcIYGGtnMRdQMRmLeRGpPHI VIATPGRLIDHIER. gtldLDrleML 

+++I+GG + + Q L+++P HI+IATPGRLIDH+E+ ++L+++++L 
Query 123 SAVIVGGIDSMSQSLALAKKP-HIIIATPGRLIDHLENTKGFNLRALKYL 171 

HMM VMDEADRMLDMGFIDQIRrlMrqIPMpwNRQTMMFSATMPdelqELARrF 

VMDEADR+L+M+F+ ++++I++ IP ++R T +FSATM++++Q+L+R+ 
Query 172 VMDEADRILNMDFETEVDKILKVIP— RDRKTFLFSATMTKKVQKLQRAA 219 

HMM MRNPIRInldMdElTtnEnlkQwYiyVerEMWKfdcLcrLIe* 

++NP+ ++ ++++T++ ++Q+YI+++ + K +L+++++ 
Query 220 LKN PVKCAVS S K YQT VE- KLQQY Y I FI P-SKFKDT YLVYI LN 259 

HMM_NAME Helicases conserved C-terminal domain 

HMM *EileeWLknlGIrvmYIHGdMpQeERdeIMddFNnGEynVLIcTDVggR 
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++ + L+NLG++++ +HG+M+Q +R+ +++F++ +L++TDV++R 
Query 277 QRTALLLRNLGFTAIPLHGQMSQSKRLGSLNKFKAKARSILLATDVASR 325 

HMM GIDIPdVNHVINYDMPWNPEqYIQRIGRTgRIG* 

G+DIP V++V+N+D+P ++ +YI+R+GRT+R+G 
Query 326 GLDI PHVDVVVNFDIPTHSKDYI HRVGRTARAG 358 
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DKFZphfbr2_71o20 
group: brain derived 

DKFZphfbr2_71o20 encodes a novel 232 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

unknown 

complete cDNA, complete cds, EST hits 

on genomic level encoded by AC006186 (3 exons) 

Sequenced by GBF 

Locus: /map="10q22. 1" 

Insert length: 1768 bp 

Poly A stretch at pos . 1742, polyadenylation signal at pos . 1726 

1 GGGGGCAGCA GGCCAAGGGG GAGGTGCGAG CGTGGACCTG GGACGGGTCT 

51 GGGCGGCTCT CGGTGGTTGG CACGGGTTCG CACACCCATT CAAGCGGCAG 

101 GACGCACTTG TCTTAGCAGT TCTCGCTGAC CGCGCTAGCT GCGGCTTCTA 

151 CGCTCCGGCA CTCTGAGTTC ATCAGCAAAC GCCCTGGCGT CTGTCCTCAC 

201 CATGCCTAGC CTTTGGGACC GCTTCTCGTC GTCGTCCACC TCCTCTTCGC 

251 CCTCGTCCTT GCCCCGAACT CCCACCCCAG ATCGGCCGCC GCGCTCAGCC 

301 TGGGGGTCGG CGACCCGGGA GGAGGGGTTT GACCGCTCCA CGAGCCTGGA 

351 GAGCTCGGAC TGCGAGTCCC TGGACAGCAG CAACAGTGGC TTCGGGCCGG 

401 AGGAAGACAC GGCTTACCTG GATGGGGTGT CGTTGCCCGA CTTCGAGCTG 

451 CTCAGTGACC CTGAGGATGA ACACTTGTGT GCCAACCTGA TGCAGCTGCT 

501 GCAGGAGAGC CTGGCCCAGG CGCGGCTGGG CTCTCGACGC CCTGCGCGCC 

551 TGCTGATGCC TAGCCAGTTG GTAAGCCAGG TGGGCAAAGA ACTACTGCGC 

601 CTGGCCTACA GCGAGCCGTG CGGCCTGCGG GGGGCGCTGC TGGACGTCTG 

651 CGTGGAGCAG GGCAAGAGCT GCCACAGCGT GGGCCAGCTG GCACTCGACC 

701 CCAGCCTGGT GCCCACCTTC CAGCTGACCC TCGTGCTGCG CCTGGACTCA 

751 CGACTCTGGC CCAAGATCCA GGGGCTGTTT AGCTCCGCCA ACTCTCCCTT 

801 CCTCCCTGGC TTCAGCCAGT CCCTGACGCT GAGCACTGGC TTCCGAGTCA 

851 TCAAGAAGAA GCTGTACAGC TCGGAACAGC TGCCCATTGA GGAGTGTTGA 

901 ACTTCAACCT GAGGGGGCCG ACAGTGCCCT CCAAGACAGA GACGACTGAA 

951 CTTTTGGGGT GGAGACTAGA GGCAGGAGCT GAGGGACTGA TTCCAGTGGT 

1001 TGGAAAACTG AGGCAGCCAC CTAAAGTGGA GGTGGGGGAA TAGTGTTTCC 

1051 CAGGAAGCTC ATTGAGTTGT GTGCGGGTGG CTGTGCATTG GGGACACATA 

1101 CCCCTCAGTA CTGTAGCATG AAACAAAGGC TTAGGGGCCA ACAAGGCTTC 

1151 CAGCTGGATG TGTGTGTAGC ATGTACCTTA TTATTTTTGT TACTGACAGT 

1201 TAACAGTGGT GTGACATCCA GAGAGCAGCT GGGCTGCTCC CGCCCCAGCC 

1251 TGGCCCAGGG TGAAGGAAGA GGCACGTGCT CCTCAGAGCA GCCGGAGGGA 

1301 AGGGGGAGGT CGGAGGTCGT GGAGGTGGTT TGTGTATCTT ACTGGTCTGA 

1351 AGGGACCAAG TGTGTTTGTT GTTTGTTTTG TATCTTGTTT TTCTGATCGG 

1401 AGCATCACTA CTGACCTGTT GTAGGCAGCT ATCTTACAGA CGCATGAATG 

1451 TAAGAGTAGG AAGGGGTGGG TGTCAGGGAT CACTTGGGAT CTTTGACACT 

1501 TGAAAAATTA CACCTGGCAG CTGCGTTTAA GCCTTCCCCC ATCGTGTACT 

1551 GCAGAGTTGA GCTGGCAGGG GAGGGGCTGA GAGGGTGGGG GCTGGAACCC 

1601 CTTCCCGGGA GGAGTGCCAT CTGGGTCTTC CATCTAGAAC TGTTTACATG 

1651 AAGATAAGAT ACTCACTGTT CATGAATACA CTTGATGTTC AAGTATTAAG 

1701 ACCTATGCAA TATTTTTTAC TTTTCTAATA AACATGTTTG TTAAAACAAA 
1751 AAAAAAAAAA AAAAAAAA 



BLAST Results 



Entry AC006186 from database EMBLNEW: 

*** SEQUENCING IN PROGRESS *** Homo sapiens chromosome 10 clone 
CRI-JC2048 map 10q22.1; HTGS phase 1, 4 unordered pieces. 
Score = 6512, P = 0.0e+00, identities = 1326/1345 
3 exons 



Medline entries 



No Medline entry 
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Peptide information for frame 1 



ORF from 202 bp to 897 bp; peptide length: 232 
Category: putative protein 



1 MPSLWDRFSS SSTSSSPSSL PRTPTPDRPP RSAWGSATRE EGFDRSTSLE 
51 SSDCESLOSS NSGFGPEEDT AYLDGVSLPD FELLS DPEDE HLCANLMQLL 
101 QESLAQARLG SRRPARLLMP SQLVSQVGKE LLRLAYSEPC GLRGALLDVC 
151 VEQGKSCHSV GQLALDPSLV PTFQLTLVLR LDSRLWPKIQ GLFSSANSPF 
201 LPGFSQSLTL STGFRVIKKK LYSSEQLPIE EC 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_71o20, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_71o20, frame 1 



Report for DKFZphfbr2_71o20 . 1 



[LENGTH] 232 

[MW] 25354.60 

(pi] 4.87 

(PROSITE] MYRISTYL 2 

[PROSITE] CK2_PHOSPHO_SITE 6 

(PROSITE] GLYCOSAMINOGLYCAN 1 

(PROSITE] PKC_PHOSPHO_SITE 1 

fKW] All_Alpha 

LKW] LOW_COMPLEXITY 17.67 % 



SEQ MPSLWDRFSSSSTSSSPSSLPRTPTPDRPPRSAWGSATREEGFDRSTSLESSDCESLDSS 

SEG xxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ NSGFGPEEDTAYLDGVSLPDFELLSDPEDEHLCANLMQLLQESLAQARLGSRRPARLLMP 

SEG xx 

PRD cccccccccccccccccccceeeccccccchhhhhhhhhhhhhhhhhhccccccceeecc 

SEQ SQLVSQVGKELLRLAYSEPCGLRGALLDVCVEQGKSCHSVGQLALDPSLVPTFQLTLVLR 

SEG 

PRD ccccchhhhhhhhhhhcccccchhhhhhhhccccccccccccccccccccchhhhhhccc 

SEQ LDSRLWPKIQGLFSSANSPFLPGFSQSLTLSTGFRVIKKKLYSSEQLPIEEC 

SEG 

PRD cccccccccccccccccccccccccceeeecccccccccccccccccccccc 
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PS00002 


62->66 


PS00005 


111->114 


PS00006 


3->7 


PS00006 


38->42 


PS00006 


47->51 


PS00006 


52->56 


PS00006 


77->81 


PS00006 


85->89 


PS00008 


141->147 


PS00008 


191->197 



GLYCOSAMINOGLYCAN 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 



PDOC00002 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphfbr2_71o20 . 1) 
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DKFZphfbr2_72bl8 

group: nucleic acid management 

DKFZphfbr2_72bl8 encodes a novel 715 amino acid protein with similarity to E . coli DNA-damage- 
inducibile protein dinP and other proteins induced by DNA-damage. 

The novel protein is similar to dinP of E. coli, yqjH of B. subtilis, dinP of M. tuberculosis 
and T19K24.15 of A. thaliana. The dinB/P pathway is a second SOS-pathway in E. coli. Therefore 
the new gene seems to be involved in DNA repair. 

The new protein can find application in modulating DNA repair and mutagenesis, 
similarity to DNA damage induced genes 

complete cDNA, complete cds, potential start at Bp 49, EST hits 
localisation primer site B is missing! 

Sequenced by LMU 

Locus: /map- H 416.0 cR from top of Chrl8 linkage group"?? 
Insert length: 2475 bp 

Poly A stretch at pos. 2452, polyadenylation signal at pos. 2431 

1 GGGGGAGGAA GGCGGCGGCG ACGACGAGGA AGACGCCGAG GCCTGGGCCA 
51 TGGAACTGGC GGACGTGGGG GCGGCAGCCA GCTCGCAGGG AGTTCATGAT 

101 CAAGTGTTGC CCACACCAAA TGCTTCATCC AGAGTCATAG TACATGTGGA 

151 TCTGGATTGC TTTTATGCAC AAGTAGAAAT GATCTCAAAT CCAGAGCTAA 

201 AAGACAAACC TTTAGGGGTT CAACAGAAAT ATTTGGTGGT TACCTGCAAC 

251 TATGAAGCTA GGAAACTTGG AGTTAAGAAA CTTATGAATG TCAGAGATGC 

301 AAAAGAAAAG TGTCCACAGT TGGTATTAGT TAATGGAGAA GACCTGACCC 

351 GCTACAGAGA AATGTCTTAT AAGGTTACAG AATTACTGGA AGAATTTAGT 

401 CCAGTTGTTG AGAGACTTGG ATTTGATGAA AATTTTGTGG ATCTAACAGA 

451 AATGGTTGAG AAGAGACTAC AGCAGCTGCA AAGTGATGAA CTTTCTGCGG 

501 TGACTGTGTC GGGTCATGTA TACAATAATC AGTCTATAAA CCTGCTTGAC 

551 GTCTTGCACA TCAGACTACT TGTTGGATCT CAGATTGCAG CAGAGATGCG 

601 GGAAGCCATG TATAATCAGT TGGGGCTCAC TGGCTGTGCT GGAGTGGCTT 

651 CTAATAAACT GTTGGCAAAA TTAGTTTCTG GTGTCTTTAA ACCAAATCAA 

701 CAAACAGTCT TATTACCTGA AAGTTGTCAA CATCTTATTC ATAGTTTGAA 

751 TCACATAAAG GAAATACCTG GTATTGGCTA TAAAACTGCC AAATGTCTTG 

801 AAGCACTGGG TATCAATAGT GTGCGTGATC TCCAAACCTT TTCACCCAAA 

851 ATTTTAGAAA AAGAATTAGG AATTTCAGTT GCTCAGCGTA TCCAAAAGCT 

901 CAGTTTTGGA GAGGATAACT CCCCTGTGAT ACTCTCAGGA CCACCTCAGT 

951 CCTTTAGTGA AGAAGATTCA TTTAAAAAAT GTACATCTGA AGTTGAAGCT 
1001 AAAAATAAGA TTGAAGAACT ACTTGCTAGT CTTTTAAACA GAGTATGCCA 
1051 AGATGGAAGG AAGCCTCATA CAGTGAGATT AATAATCCGT CGGTATTCCT 
1101 CTGAGAAGCA CTATGGTCGT GAGAGTCGTC AGTGCCCTAT TCCTTCACAT 
1151 GTAATTCAGA AATTAGGGAC AGGAAATTAT GATGTGATGA CCCCAATGGT 
1201 TGATATACTT ATGAAACTTT TTCGAAATAT GGTGAATGTG AAGATGCCAT 
1251 TTCACCTTAC CCTTCTAAGT GTGTGCTTCT GCAACCTTAA AGCACTAAAT 
1301 ACTGCTAAGA AAGGGCTTAT TG ATT ATT AT TTAATGCCAT CATTATCAAC 
1351 TACTTCACGC TCTGGCAAGC ACAGTTTTAA AATGAAAGAC ACTCATATGG 
1401 AAGATTTTCC C AAA G AC AAA GAAACAAACC GGGATTTCCT ACCAAGTGGA 
1451 AGAATTGAAA GTACAAGAAC TAGGGAGTCT CCACTAGATA CCACAAATTT 
1501 TTCTAAAGAA AAAGACATTA ATGAATTCCC ACTCTGTTCA CTTCCTGAAG 
1551 GTGTTGACCA AGAAGTCTCC AAGCAGCTTC CAGTAGATAT TCAAGAAGAA 
1601 ATCCTTTCTG GAAAATCTAG GGAAAAATTT CAAGGGAAAG GAAGTGTGAG 
1651 TTGTCCATTA CATGCCTCTA GAGGAGTATT ATCTTTCTTT TCTAAAAAAC 
1701 AAATGCAAGA TATTCCCATA AATCCTAGAG ATCATTTATC CAGTAGCAAA 
1751 CAGGTATCCT CTGTATCTCC TTGTGAACCG GGAACATCAG GCTTTAATAG 
1801 CAGTAGTTCT TCTTACATGT CTAGCCAAAA GGATTATTCA TATTATTTAG 
1851 ATAATAGATT AAAAGATGAA CGAATAAGTC AAGGACCTAA AGAACCTCAA 
1901 GGATTCCACT TTACAAATTC AAACCCTGCT GTGTCTGCTT TTCATTCATT 
1951 TCCAAACTTG CAGAGTGAGC AACTTTTCTC CAGAAACCAC ACTACAGATA 
2001 GCCATAAGCA AACAGTAGCA ACAGACTCTC ATGAAGGACT TACAGAAAAT 
2051 AGAGAGCCAG ATTCTGTTGA TGAGAAAATT ACTTTCCCTT CTGACATTGA 
2101 TCCTCAAGTT TTCTATGAAC TACCAGAAGC AGTACAAAAG GAACTGCTGG 
2151 CAGAGTGGAA GAGAACAGGA TCAGATTTCC ACATTGGACA TAAATAAGCA 
2201 TATTCAGCAA AAAGGTCTGA AAAGCAAGGG AATACCATTA TTTTCGGATT 
2251 AGCGGTTTAT TAAGCTCTTC TATATTAAAC ACTAATAGAT ATTCAATAAC 
2301 GGAGTAAACT GTTCCAGATA AAGCAAGAAT AGTTGCAAGA AGTAAATTCT 
2351 GGCACAAAGC GTAAAAATAT AACAGAAGAA ATAATGTAAA ATACTATCTT 
2401 TTATGTCTAA AGCCATTTTA TATTACTTTT CAATAAAAAG AATATCATGG 
24 51 TCAAAAAAAA AAAAAAAAAA AAAAC 



BLAST Results 
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Entry HS086339 from database EMBL: 
human STS WI-11064. 
Score = 1523, P - 3.0e-64, identities = 327/343 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 50 bp to 2194 bp; peptide length: 715 
Category: similarity to known protein 



1 MELADVGAAA SSQGVHDQVL PTPNASSRVI VHVDLDCFYA QVEMISNPEL 
51 KDKPLGVQQK YLVVTCNYEA RKLGVKKLMN VRDAKEKCPQ LVLVNGEDLT 
101 RYREMSYKVT ELLEEFSPW ERLGFDENFV DLTEMVEKRL QQLQSDELSA 
151 VTVSGHVYNN QSINLLDVLH IRLLVGSQIA AEMREAMYNQ LGLTGCAGVA 
201 SNKLLAKLVS GVFKPNQQTV LLPESCQHLI HSLNHIKE1P GIGYKTAKCL 
251 EALGINSVRD LQTFSPKILE KELGISVAQR IQKLSFGEDN SPVILSGPPQ 
301 SFSEEDSFKK CTSEVEAKNK IEELLASLLN RVCQDGRKPH TVRLIIRRYS 
351 SEKHYGRESR QCPIPSHVIQ KLGTGNYDVM TPMVDILMKL FRNMVNVKMP 
401 FHLTLLSVCF CNLKALNTAK KGLI DYYLMP SLSTTSRSGK HSFKMKDTHM 
451 EDFPKDKETN RDFLPSGRIE STRTRESPLD TTNFSKEKDI NEFPLCSLPE 
501 GVDQEVSKQL PVDIQEEILS GKSREKFQGK GSVSCPLHAS RGVLSFFSKK 
551 QMQDIPINPR DHLSSSKQVS SVSPCEPGTS GFNSSSSSYM SSQKDYSYYL 
601 DNRLKDERIS QGPKEPQGFH FTNSNPAVSA FHSFPNLQSE QLFSRNHTTD 
651 SHKQTVATDS HEGLTENREP DSVDEKITFP SDIDPQVFYE LPEAVQKELL 
701 AEWKRTGSDF HIGHK 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_72bl8, frame 2 

PIR:H64747 DNA-damage-inducibile protein dinP - Escherichia coli, N = 
2, Score = 212, P = 4.2e-27 

PIR:H69963 DNA-damage repair protein homolog yqjH - Bacillus subtilis, 
N * 2, Score - 230, P - 5.2e-26 



>PIR:H69963 DNA-damage repair protein homolog yqjH - Bacillus subtilis 
Length = 414 

HSPs: 

Score = 230 (34.5 bits), Expect = 5.2e-26, Sum P(2) = 5.2e-26 
Identities = 47/112 (41%), Positives » 73/112 (65%) 

Query: 27 SRV I VH V DLDC FY AQVEM I SN PEL KDK PLGV QQKYLWTCNYEARKLGVKKLMNV 81 

SR+I H+D++ FYA VEM +P L+ KP+ V ++K +WTOYEAR GVK M V 

Sbjct: 5 SRI I FHI DMNSFYAS VEMAYDPALRGKPVAVAGNVKERKGI WTCS YEARARGVKTTMPV 64 

Query: 82 RDAKEKCPQLVLVNGEDLTRYREMSYKVTELLEEFS PWERLGFDENFVDLTE 134 

AK CP+L+++ + RYR S + +L E++ +VE + DE ++D+T+ 
Sbjct: 65 WQAKRHCPELIVLP-PNFDRYRNSSRAMFTILREYTDLVEPVSIDEGYMDMTD 116 

Score = 137 (20.6 bits), Expect = 5.2e-26, Sum P(2) = 5.2e-26 
Identities = 43/148 (29%), Positives = 75/148 (50%) 

Query: 178 QIAAEMREAMYNQLGLTGCAGVASNKLLAKLVSGVFKPNQQTVLLPESCQHLIHSLNHIK 237 

+ A E++ + +L L G+A NK LAK+ S + KP T+L ++ L + 

Sbjct: 125 ETAKEIQSRLQKELLLPSSIGIAPNKFLAKMASDMKKPLGITILRKRQVPDILWPLP-VG 183 

Query: 238 EIPGIGYKTAKCLEALGINSVRDLQTFSPKILEKELGISVAQRIQKLSFGEDNSPVILSG 297 

E+ G+G KTA+ L+ LGI+++ +L L++ LGI+ R++ + G ++PV 
Sbjct: 184 EMHGVGKKTAEKLKGLGIHTIGELAAADEHSLKRLLGIN-GPRLKNKANGIHHAPV 238 

Query: 298 PPQSFSEEDS FKKCTS EVEAKNK I EELL 325 
P+ E S ++ + EELL 



310 



WO 01/12659 PCT/IB00/01496 

Sbjct: 239 DPERIYEFKSVGNSSTLSHDSSDEEELL 266 

Pedant information for DKFZphfbr2_72bl8, frame 2 
Report for DKFZphfbr2_72bl8 . 2 

[LENGTH] 715 

[MW] 80300.63 

[pi] 6.37 

[HOMOL] TREMBL:SPBC16A3_11 gene: "SPBC16A3.il"; product: "hypothetical protein"; 

S.pombe chromosome II cosmid cl6A3. 5e-30 

[FUNCAT] 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YDR419w) 2e-15 

[FUNCAT] 1 genome replication, transcription, recombination and repair [M. 

genitalium, MG360] 3e-13 

[PIRKW] SOS mutagenesis 2e-ll 

[PIRKW] DNA repair 2e-ll 

[PIRKW] induced mutagenesis 2e-ll 

[SUPFAM] umuC protein 3e-29 

[PROSITE] MYRISTYL 6 

[PROSITE] AMI DAT I ON 1 

[PROSITE] CAMP_PHOSPHO_SITE 2 

[PROSITE] CK2_PHOSPHO_SITE 15 

[PROSITE] PR0KAR_LIPOPR0TEIN 1 

(PROSITE] TYR_PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 21 

[PROSITE] ASN_GLYCOSYLATION 5 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 4.20 % 

SEQ MELADVGAAASSQGVHDQVLPTPNASSRVIVHVDLDCFYAQVEMISNPELKDKPLGVQQK 

SEG 

PRD ccceeeeeeecccccceeeccccccceeeeeeeccchhhhhhhhhccccccccceeeecc 

SEQ YLWTCNYEARKLGVKKLMNVRDAKEKCPQLVLVNGEDLTRYREMSYKVTELLEEFSPVV 

SEG 

PRD ceeeehhhhhhhhhhcccchhhhhhhhccceeeeccccccchhhhhhhhhhhhhhhccce 

SEQ ERLGFDENFVDLTEMVEKRLQQLQSDELSAVTVSGHVYNNQSINLLDVLHIRLLVGSQIA 

SEG 

PRD eeeccchhhhhhhhhhhhhhhhhhccccceeeeeccccccchhhhhhhhhhhhhhhhhhh 

SEQ AEMREAMYNQLGLTGCAGVASNKLLAKLVSGVFKPNQQTVLLPESCQHLIHSLNHIKEIP 

SEG 

PRD hhhhhhhhhhhcceeeeccchhhhhhhhhhhhhcccceeeeecchhhhhhhhhccccccc 

SEQ GIGYKTAKCLEALGINSVRDLQTFSPKILEKELGISVAQRIQKLSFGEDNSPVILSGPPQ 

SEG 

PRD ccchhhhhhhhhhccccchhhhhhhhhhhhhhccchhhhhhhhhhcccccceeeeccccc 

SEQ SFSEEDSFKKCTSEVEAKNKIEELLASLLNRVCQDGRKPHTVRLIIRRYSSEKHYGRESR 

SEG 

PRD ccccccccccchhhhhhhhhhhhhhhhhhhhhhhccccccceeeehhhhhhhhhhhcccc 

SEQ QCPIPSHVIQKLGTGNYDVMTPMVDILMKLFRNMVNVKMPFHLTLLSVCFCNLKALNTAK 

SEG 

PRD ccccccceeeeccccccccchhhhhhhhhhhhhhhhhcccceeeeeeeeechhhhhhhhh 

SEQ KGLIDYYLMPSLSTTSRSGKHSFKMKDTHMEDFPKDKETNRDFLPSGRIESTRTRESPLD 

SEG 

PRD hhhheeeecccccccccccccceeeccccccccccccccccccccccccccccccccccc 

SEQ TTNFSKEKDINEFPLCSLPEGVDQEVSKQLPVDIQEEILSGKSREKFQGKGSVSCPLHAS 

SEG 

PRD cccccccccccccccccccchhhhhhhhhhhhhhhhhhhcccceeeeecccccccchhhh 

SEQ RGVLSFFSKKQMQDIPINPRDHLSSSKQVSSVSPCEPGTSGFNSSSSSYMSSQKDYSYYL 

SEG xxxxxxxxxx xxxxxxxxxxxxxxxxxxxx . 

PRD hcccccccccccccccccccccccccccccccccccccccccccccccccccccchhhhh 

SEQ DNRLKDERISQGPKEPQGFHFTNSNPAVSAFHSFPNLQSEQLFSRNHTTDSHKQTVATDS 

SEG 

PRD hhhhhhhhhhcccccccceeeeccccceeecccccccchhhhhhhccccccceeeeeecc 

SEQ HEGLTENREPDSVDEKITFPSDIDPQVFYELPEAVQKELLAEWKRTGSDFHIGHK 

SEG 

PRD ccccccccccccccccccccccccceeehhhhhhhhhhhhhhhhhcccccccccc 
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Prosite for DKFZphfbr2_72bl8 . 2 



DC rtftAftl 

P5UUUU1 


24->28 


nOM VjLtX^US XLjMI 1UIN 


pdocooooi 


PSUUUU1 


160 


->164 


ACW^VlT VPfiQVT ITTfMJ 
/4oH__wXjIV*vJ3 IJj/il iun 


PDOCOOOOI 


rSUUUUl 


483 


->487 


acu ^T.vrriQYT aTTnw 

ndfl UlilLUd X Jj/\1 1UH 


PDOC0000 1 


DC fl nnn 1 


583 


->587 




PDOCOOOOI 


Dcnnnm 


646 


->650 




PDOCOOOOI 


Dcnnnni 

rSUUUUI 


309 


->313 




PDOCOOOOd 




347 


->351 


r a md" pun c p u o"~c t t p 


pnorooood 




26->29 


Otef* DHOCDHrt QTTP 

fiMv rti\Jotfn\j oiiCi 


pdocoooor 


roUUUUD 


106 


->109 


ft\U rnUornu Slid 


pnnrnnnrm 




201 


->204 




pnocnnon ^ 


ocnnnn c. 
roUUUUO 


24 6 


->249 


nvr OUnCDUri CTTP 

rfvU rnUornU 511L 


Dnncnnnn ^ 


rbUUUUO 


257 


->260 


nvr DtinCDUn CTTP 

rKL rnUbrnU bllb 




nn A A n n C 

PSUUUUD 


265 


->268 




rUUL UUUUj 


t*o n nnn c 
PSUUUU3 


307 


->310 


PKL_PHOS PHU_b lib 


or>nr* nnnn^ 


nc nnnnc 
PSUUUU 3 


341 


->344 


Tivc DuncDun cttp 
rM rnUornU lb 


onnr 1 n n n fi ^ 


PSUUUUD 


351 


->354 


nif/** nunc nun cttd 




no nn rtrt c 


418' 


->421 






PSUUUUD 


435 


->438 


fl\U rnUbrnU bllb 


dfi nnnn^ 

i'UU*— U U U U D 


nc nnnnc 
PSUUUU D 


438 


->441 


nvC nUAP nilA CTTP 

PKL._PHOSPHU__bl lb 


rUUUUUUUD 


PSUUUU 3 


442 


->445 


DVT" DUACOUA CTTP 

rlU* rnUornU bllb 


nnor n n n n ^ 
rUU^ u u U U 3 


Be nnnnc 
PSUUUUD 


459 


->462 


BVf DUAC nun CTTP 

rJ\L_rnOornU_aI lb 


rUUtUUUUj 


A ft A A C 

PSUUUUD 


466 


->469 


nvr 1 nunc bua cttp 
PKC PHOSPHO SITE 


onAA nnnnc. 


ftft Art C 

PSUUOUD 


471' 


->474 


fiv/> nunc nun cttc* 
PKC PHOSPHO SITE 


nrvv nnnnc. 
PLHJCUUUU3 


nc nnnnc 
PSUUUUD 


520 


->523 


r\is r* nunc Dun cttp 
PKC_PHOb PnO_S lib 


ruvj^uuuuo 


Tie nnnnc 
PSUUUUD 


548 


->551 


n«fr nunc dua cttp 
PKC PHObPHO_bl lb 


rUULUUUUj 


nn nnnnc 
PSUUUUD 


565 


->568 


nvr nUAC DUA CTTP 

Pi\(- rnUbrtlU bllb 


rUULUUUUj 


nc nnnnc 
PSUUUUD 


592 


->595 


OVA DUflCDUft CTTP 

PKt-_rHOSPHO_5I IE 


nnAp nnnnc. 


npnfl Aft C 

PSUUUUD 


651 


->654 


nvf* r>uno nun cttp 
PKC__PHObPHO_S lib 


ormr* nnnnc. 


no nrtAnc 
PSUUUUo 


46->50 


ntrO nuncoun cttp 
CK2 PHU5PHO_SITb 


on/v nnnnc 
rUOCUUUUD 


op ft ft ft ft £ 

PSQUQQ6 


257 


->261 


nvO nunc 1 nun cttp 
CKz PHOSPHO SITE 


Dr*nnn n nn c 


PS00006 


285 


->289 


nvo nunc t nun prnip 

CKZ_PHOSPHO_5ITE 


PUOCUUUUb 


nr* A ft ft ft 

PSUUUUD 


301 


->305 


nvo nuncoun cttp 
CK<£ PHOSPHO SITE 


nnArArtAA c 
PLUJCUUUU D 


PS00006 


303 


->307 


nvO nunc nun cttp 
CKz PHOSPHO SITE 


PDOUUUUUO 


PS00006 


313 


->317 


CK2_PHOSPHO_SITE 


PDOLUUUUb 


PS00006 


448 


->452 


CKZ_PHOSFHO_SITE 


n r\AA nnnnc 
PDOCUUUUb 


PS00006 


459 


->463 


CK2 PH0SPH0_SITE 


PDOC00006 


PSOO00 6 


477 


~>481 


o t/O r> t_rn <!* riTiA\ CTTP 

CKz rnUbrnU bllb 


nnnnn (\f\T\C 


P5UUU-U D 


497 


->501 


ai/o nunc nun cttp 
rnUornU SI lb 


rUULUUUUO 


PS00006 


573 


->577 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


592 


->596 


CK2~PH0SPH0 SITE 


PDOC00006 


PS00006 


672 


->676 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


681 


->685 


CK2 PHOSPHO SITE 


PLXX;00006 


PS00006 


706 


->710 


CK2 PHOSPHO SITE 


PCK>C00006 


PS00007 


101 


->108 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


348 


->356 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 




7->13 


MYRISTYL 


PDOC00008 


PS00008 


176' 


->182 


MYRISTYL 


PDOC00008 


PS00008 


192- 


->198 


MYRISTYL 


PDOC00008 


PS00008 


198- 


->204 


MYRISTYL 


PDOC00008 


PS00008 


274- 


->280 


MYRISTYL 


PDOC00008 


PS00008 


663 


->669 


MYRISTYL 


PDOC00008 


PS00009 


335- 


->339 


AMIDATION 


PDOC00009 


PS00013 


186 


->197 


PROKAR LIPOPROTEIN 


PDOC00013 



(No Pfam data available for DKFZph£br2_72bl8.2) 



312 



WO 01/12659 



PCT/IB00/01496 



DKFZphfbr2_72dl3 



group: brain derived 

DKFZphfbr2_72dl3 encodes a novel 165 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes - 

unknown 

seems to be testis specific 9 of 10 EST hits are from testis librarys 
Sequenced by LMU 
Locus : unknown 
Insert length: 723 bp 

Poly A stretch at pos. 704, no polyadenylation signal found 



1 AGGGGGGGTA 

51 CTGAGCTCAC 

101 CCCTTATAAT 

151 CCATGACCCG 

201 CCAGTTCCTC 

251 GCGTCCACCT 

301 GTGTCCTATG 

351 ACCACTGGCC 

401 CCTGCTCCAT 

451 CCAGGGGCCA 

501 CTACTCCTGC 

551 ACTGCTGCTG 

601 TGCCCTTGAC 

651 GAGCCCCTCC 

701 CTTCAAAAAA 



TGGGGGAGGG 
CCTTCTGTCT 
CCTTTTCAGC 
GCTCTGCTTA 
CAAGGGGCCT 
GTATCCACCT 
GCTGGGGGCA 
CAGCCCTGCT 
AGGCCCGCAG 
GAGTCAGGGG 
AAATGGGTAC 
CTGCTCATGG 
CCTGCTTGGC 
CCACAACTCA 
AAAAAAAAAA 



GGAGACTCTG 
GCCCGGGCCC 
ACTAGGTCTT 
CCCAGACCCG 
GGGTGCTGGG 
GGGGCCCTAG 
CTAGGACTGA 
GCTGCTTCTG 
GTCACACTCT 
GCCGGTGAAG 
AGTCTCAGGA 
GGCTGGGCCC 
CTGGCTTTCT 
GTGTCCTTCA 
AAC 



CAGGAGCCTA 
TACCCCTTCC 
CCCGTCACCT 
AAGCACGTGA 
GAGGGGTCAG 
CTGGGCCCAG 
CAATCCAGGC 
GTCAGCTTCC 
GCCACAGCGC 
GTCCTGGACA 
CAACTTAGCC 
GCTCCTGAGA 
GCCTCCATCC 
AATATACAAT 



ATTCCCCACT 
CCTACTCTCA 
CCACCTCTCT 
GGATCCGATC 
GTAGTCCAGT 
CTCCTGGACA 
AGTCTTTTCC 
TCACCTTTGA 
AAACTTCTCA 
GCAGGAGGCT 
TCCAGGACGC 
GCCTGTGGCA 
TTGGGCCTGA 
GACCACCCTT 



BLAST Results 



Entry HS860F19 from database EMBLNEW: 
Human DNA sequence *** SEQUENCING IN PROGRESS ** 
Score = 2059, P - l.le-85, identities = 423/434 
2 exons 



from clone 860F19 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 153 bp to 647 bp; peptide length: 165 
Category: putative protein 
Classification: no clue 

1 MTRLCLPRPE AREDPIPVPP RGLGAGEGSG SPVRPPVSTW GPSWAQLLDS 
51 VLWLGALGLT IQAVFSTTGP ALLLLLVSFL TFDLLHRPAG HTLPQRKLLT 
101 RGQSQGAGEG PGQQEALLLQ MGTVSGQLSL QDALLLLLMG LGPLLRACGM 
151 PLTLLGLAFC LHPWA 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_72dl3, frame 3 
No Alert BLASTP hits found 
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Pedant information for DKFZphfbr2_72dl3, frame 3 



Report for DKFZphfbr2_72dl3 . 3 



[LENGTH] 165 

[MW] 17393.73 

[pi] 7.80 

[BLOCKS] BL00068A Malate dehydrogenase proteins 

[ KW] TRANSMEMBRANE 2 

[KW] LOW COMPLEXITY 29.70 % 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



MTRLCLPRPEAREDPIPVPPRGLGAGEGSGSPVRPPVSTWGPSWAQLLDSVLWLGALGLT 
ccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhcccccc 



IQAVFSTTGPALLLLLVSFLTFDLLHRPAGHTLPQRKLLTRGQSQGAGEGPGQQEALLLQ 

xxxxxxxxxxxxxx xxxxxxxxxxxxxxx. . . . 

eeeecccccchhhhhhhhhhhhhhccccccccccccccccccccccccccccchhhhhhh 
MMMMMMMMMMMMMMMMM 

MGTVSGQLSLQDALLLLLMGLGPLLRACGMPLTLLGLAFCLHPWA 

xxxxxxxxxxxxxxxxxxxx 

hcccccchhhhhhhhhhhhccchhhhhcccccchhhhhhhccccc 
MMMMMMMMMMMMMMMMM 



(No Prosite data available for DKFZphfbr2_72dl3 . 3) 
(No Pfam data available for DKFZphfbr2_72dl3 . 3) 
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DKFZphfbr2_72112 



group: nucleic acid management 

Summary DKFZphfbr2_72112 encodes a novel 344 amino acid protein with similarity to YDR126w and 
other S. cerevisiae proteins. 

The novel protein contains a myc-type, helix-loop-helix dimerization domain signature. This 
helix-loop-helix domain mediates protein dimerization and has been found in proteins such as 
the myc family of cellular oncogenes, proteins involved in myogenesis and vertebrate proteins 
that bind specific DNA sequences in various immunoglobulin chains enhancers. Therefore, the 
protein could be a novel DNA-binding protein. 

The new protein can application in modulating gene expression. 



similarity to YDR126w ; 
membrane regions : 2 

similarity to YDR126w 

complete cDNA complete cds, EST hits 

Sequenced by LMU 

Locus: unknown 

Insert length: 1270 bp 

Poly A stretch at pos. 1251, no polyadenylation signal found 



1 GGGGGCGCCC GGGAGGCGCC GGAGCCCAGC GGCTGGCGCC AGATCCAGGC 

51 TCCTGGAAGA ACCATGTCCG GCAGCTACTG GTCATGCCAG GCACACACTG 

101 CTGCCCAAGA GGAGCTGCTG TTTGAATTAT CTGTGAATGT TGGGAAGAGG 

151 AATGCCAGAG CTGCCGGCTG AAAATTACCC AACCAAGAGA AATCTGCAGG 

201 ATGGACTTTC TGGTCCTCTT CTTGTTCTAC CTGGCTTCGG TGCTGATGGG 

251 TCTTGTTCTT ATCTGCGTCT GCTCGAAAAC CCATAGCTTG AAAGGCCTGG 

301 CCAGGGGAGG AGCACAGATA TTTTCCTGTA TAATTCCAGA ATGTCTTCAG 

351 AGAGCCGTGC ATGGATTGCT TCATTACCTT TTCCATACGA GAAACCACAC 

401 CTTCATTGTC CTGCACCTGG TCTTGCAAGG GATGGTTTAT ACTGAGTACA 

451 CCTGGGAAGT ATTTGGCTAC TGTCAGGAGC TGGAGTTGTC CTTGCATTAC 

501 CTTCTTCTGC CCTATCTGCT GCTAGGTGTA AACCTGTTTT TTTTCACCCT 

551 GACTTGTGGA ACCAATCCTG GCATTATAAC AAAAGCAAAT GAATTATTAT 

601 TTCTTCATGT TTATGAATTT GATGAAGTGA TGTTTCCAAA GAACGTGAGG 

651 TGCTCTACTT GTGATTTAAG GAAACCAGCT CGATCCAAGC ACTGCAGTGT 

701 GTGTAACTGG TGTGTGCACC GTTTCGACCA TCACTGTGTT TGGGTGAACA 

751 ACTGCATCGG GGCCTGGAAC ATCAGGTACT TCCTCATCTA CGTCTTGACC 

801 TTGACGGCCT CGGCTGCCAC CGTCGCCATT GTGAGCACCA CTTTTCTGGT 

851 CCACTTGGTG GTGATGTCAG ATTTATACCA GGAGACTTAC ATCGATGACC 

901 TTGGACACCT CCATGTTATG GACACGGTCA TTCTTATTCA GTACCTGTTC 

951 CTGACTTTTC CACGGATTGT CTTCATGCTG GGCTTTGTCG TGGTCCTGAG 

1001 CTTCCTCCTG GGTGGCTACC TGTTGTCTGT CCTGTATCTG GCGGCCACCA 

1051 ACCAGACTAC TAACGAGTGG TACAGAGGTG TCTGGGCCTG GTGCCAGCGT 

1101 TGTCCCCTTG TGGCCTGGCC TCCGTCAGCA GAGCCCCAAG TCCACCGGAA 

1151 CATTCACTCC CATGGGCTTC GGAGCAACCT TCAAGAGATC TTTCTACCTG 

1201 CCTTTCCATG TCATGAGAGG AAGAAACAAG AATGACAAGT GTATGACTGC 

1251 CAAAAAAAAA AAAAAAAAAC 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 201 bp to 1232 bp; peptide length: 34 4 
Category: similarity to unknown protein 
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1 MDFLVLFLFY LASVLMGLVL ICVCSKTHSL KGLARGGAQI FSCIIPECLQ 
51 RAVHGLLHYL FHTRNHTFIV LHLVLQGMVY TEYTWEVFGY CQELELSLHY 
101 LLLPYLLLGV NLFFFTLTCG TNPGIITKAN ELLFLHVYEF DEVMFPKNVR 
151 CSTCDLRKPA RSKHCSVCNW CVHRFDHHCV WVNNCIGAWN IRYFLIYVLT 
201 LTASAATVAI VSTTFLVHLV VMSDLYQETY IDDLGHLHVM DTVILIQYLF 
251 LTFPRIVFML GFVWLSFLL GGYLLSVLYL AATNQTTNEW YRGVWAWCQR 
301 CPLVAWPPSA EPQVHRNIHS HGLRSNLQEI FLPAFPCHER KKQE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_72112, frame 3 

TREMBL:SPBC13G1_7 gene: "SPBC13G1 . 07"; product: "hypothetical protein"; 
S.pombe chromosome II cosmid cl3Gl., N * 2 f Score « 247, P =• 1.4e-22 

TREMBL :CED2021 3 gene: "02021. 2"; Caenorhabditis elegans cosmid 
D2021., N » 1, "Score - 209, P ° 9e-17 

TREMBL : CEC 4 3H 6 2 gene: "C43H6.7"; Caenorhabditis elegans cosmid 
C43H6., N * 1, "Score - 206, P = 5.2e-15 

PIR:S52691 probable membrane protein YDR126w - yeast (Saccharomyces 
cerevisiae), N « 1, Score - 207, P = 8.4e-15 

PIR:E71607 metal binding protein (DHHC domain) PFB0725C - malaria 
parasite (Plasmodium falciparum), N - 1, Score = 182, P *» l.le-13 

>TREMBL:SPBC13G1_7 gene: "SPBC13G1 . 07" ; product: "hypothetical protein"; 
S.pombe chromosome II cosmid cl3Gl. 
Length = 356 

HSPs: 

Score - 247 (37.1 bits). Expect - 1.4e-22, Sum P(2) = 1.4e-22 
Identities = 55/148 (37%), Positives - 85/148 (57%) 

Query: 52 AVHGLLHYLFHTRNH- -TFI VLHLVLQGM VYTEYTWEVFGYCQELELSLHYLLLPY 105 

A+ L +Y+ + N F+ L L+ G+ +Y + F + + L +LLPY 

Sbjct: 64 AMRSLSNYVLYKNNPLWFLYLALITIGIASFFIYGSSLTQKFSIIDWISV-LTSVLLPY 122 

Query: 106 LLLGVNLFFFTLTCGTNPGIITKANELLFLHVYEFD-EVMFPKNVRCSTCDLRKPARSKH 164 

++L+ + +NPG I N + +D ++ FP +CSTC KPARSKH 

Sbjct: 123 ISLY IAAKSNPGKIDLKNWNEASRRFPYDYKIFFPN — KCSTCKFEKPARSKH 173 

Query: 165 C S VCNWC VH RFDH HC V W VN NC I G AWN I RY FL I Y VL 199 

C +CN CV +FDHHC+W+NNC+G N RYF +++L 
Sbjct: 174 CRLCNICVEKFDHHCIWINNCVGLNNARYFFLFLL 208 



Score - 43 (6.5 bits), Expect = 1.4e-22, Sum P(2) 
Identities = 10/35 (28%), Positives = 17/35 (48%) 



1.4e-22 



Query: 257 VFMLGFVV-VLSFLLGGYLLSVLYLAATNQTTNEW 290 

VF++ + VL L GY ++Y T + +W 
Sbjct: 254 VFLISLICSVLVLCLLGYEFFLVYAGYTTNESEKW 288 



Pedant information for DKFZphfbr2_72112, frame 3 
Report for DKFZphfbr2_72112 . 3 



[LENGTH] 344 

[MW] 39677,23 

[pi] 7.26 

[HOMOLJ TREMBL: SPBC13G1_7 gene: "SPBC13G1 . 07" ; product: "hypothetical protein"; S.pombe 

chromosome II cosmid C13G1. 3e-17 

[FUNCATJ 99 unclassified proteins [S. cerevisiae, YDRl26w) le-16 

[FUNCATJ 03.07 pheromone response, mating-type determination, sex-specific proteins 

(S. cerevisiae, YDR264c] 8e-05 



[FUNCAT] 

8e-05 

[PIRKW] 

[SUPFAM] 

(SUPFAM] 

[PROSITE] 

[PROSITE] 



10.05.99 other pheromone response activities 

transmembrane protein 4e-15 

ankyrin repeat homology le-10 

unassigned ankyrin repeat proteins le-10 

MYRISTYL 4 

CK2 PHOSPHO SITE 3 



[S. cerevisiae, YDR264cJ 
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[PROSITE] 

[PROSITE] 

[KW] 

[KW] 

[KW] 



PKC_PHOSPHO_SITE 
ASN_GLYCOSYLATION 
SIGNAL_PEPTIDE 30 
TRANSMEMBRANE 2 
LOW COMPLEXITY 



16.57 % 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



MDFLVLFLFYLASVLMGLVLICVCSKTHSLKGLARGGAQIFSCIIPECLQRAVHGLLHYL 
ccchhhhhhhhhhhhhhheeeeeeccccceeeeecccceeeeeeehhhhhhhhhhhheee 



FHTRNHTFIVLHLVLQGMVYTEYTWEVFGYCQELELSLHYLLLPYLLLGVNLFFFTLTCG 

XXXXXXXXXXXXXXXXXXX 

ecccchhhhhhhhhhccchhhhhhhheeeeccceeehhhhhhhhhhhhhhcccceeeecc 
MMMMMMMMMMMMMMMMMMMMMMMMM 

TNPGIITKANELLFLHVYEFDEVMFPKNVRCSTCDLRKPARSKHCSVCNWCVHRFDHHCV 

ccccccccccchhhhhhhhhcccccccceeeecccccccccccccccceeeecccccccc 
M MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

WVNNCIGAWNIRYFLIYVLTLTASAATVAIVSTTFLVHLVVMSDLYQETYIDDLGHLHVM 

xxxxxxxxxxxxxxxxx 

cccccccccccchhhhhhhhhccchhhhhhhhhhhhhhhhhccccccccccccccccchh 



DTVILIQYLFLTFPRIVFMLGFVVVLSFLLGGYLLSVLYLAATNQTTNEWYRGVWAWCQR 

xxxxxxxxxxxxxxxxxxxxx 

hhhhhhhhhhhhhhhhccccccccceeecccchhhhhhhhhcccchhhhhhhhhhhcccc 



CPLVAWPPSAEPQVHRNIHSHGLRSNLQEIFLPAFPCHERKKQE 
cccccccccccccceeecccccccccceeeeecccccccccccc 



Prosite for DKFZphfbr2_72112 . 3 



PS00001 


65->69 


PS00001 


284->288 


PS00005 


29->32 


PS00006 


152->156 


PS00006 


229->233 


PS00006 


286->290 


PS00008 


32->38 


PS00008 


77->83 


PS00008 


120->126 


PS00008 


322->328 



AS N_G L YCOS Y LAT I ON 

ASN_GLYCOSYLATION 

PKC_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC0O006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphfbr2_72112 . 3) 
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DKFZphfbr2J72ml6 
group: unknown 

DKFZphfbr2_72ml6 encodes a novel 287 amino acid protein without similarity to known proteir 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 

unknown 

complete cDNA, complete cds, EST hits 
Sequenced by LMU 

Locus: /map-"26.2 cR from top of Chrl6 linkage group" 
Insert length: 1462 bp 

Poly A stretch at pos. 1441, polyadenylation signal at pos . 1421 

1 GGGGAGGACC GGAGGACCGA GGACAGAAAG ATTGGTGGAC AGGAGCAGCG 
51 GCCGGTGGGG AGGGCGCTCG GCGGCGGCCT GCGGCCATGG CCACCGTGAT 

101 GGCAGCGACG GCGGCGGAGC GGGCGGTGCT GGAGGAGGAG TTCCGCTGGC 

151 TGCTGCACGA CGAGGTGCAC GCTGTGTTGA AGCAGCTGCA GGACATCCTC 

201 AAGGAGGCCT CTCTGCGCTT CACTCTGCCG GGCTCCGGCA CTGAGGGGCC 

251 CGCCAAGCAA GAGAACTTCA TCCTAGGCAG CTGTGGCACA GACCAGGTGA 

301 AGGGTGTGCT GACTCTGCAG GGGGATGCCC TCAGCCAGGC GGATGTGAAC 

351 CTGAAGATGC CCCGGAACAA CCAGCTGCTG CACTTCGCCT TCCGGGAGGA 

401 CAAGCAGTGG AAGCTGCAGC AGATCCAGGA TGCCAGAAAC CATGTGAGCC 

451 AAGCCATTTA CCTGCTTACC AGCCGGGACC AGAGCTACCA GTTCAAGACG 

501 GGCGCTGAGG TCCTCAAGCT GATGGACGCA GTGATGCTGC AGCTGACCAG 

551 AGCCCGAAAC CGGCTCACCA CCCCCGCCAC CCTCACCCTC CCCGAGATCG 

601 CCGCCAGCGG CCTCACGCGG ATGTTCGCCC CTGCCCTGCC GTCCGACCTG 

651 CTGGTCAACG TCTACATCAA CCTCAACAAG CTCTGCCTCA CGGTGTACCA 

701 GCTGCATGCC CTGCAGCCCA ACTCCACCAA GAACTTCCGC CCAGCTGGGG 

751 GGGCGGTGCT GCATAGCCCT GGGGCCATGT TCGAGTGGGG CTCTCAGCGC 

801 CTGGAGGTGA GCCACGTGCA CAAAGTGGAG TGCGTGATCC CCTGGCTCAA 

851 CGACGCCCTG GTCTACTTCA CCGTCTCCCT GCAGCTCTGC CAGCAGCTTA 

901 AGGACAAGAT CTCCGTGTTC TCCAGCTACT GGAGCTACAG ACCCTTCTGA 

951 TCACAGCACC CAGGAGCTTG TCTCCAGGAA GGCGGCCCCG TCCCCTACTC 
1001 ATACCCACCA CAGAGCACCA GCCAGTGCCA ACGCCAGGCT GCTATTTATC 
1051 TCCCTATCCC ACCCCCTACC CCACCTAACA CATTTGCACT GCCGGGAATG 
1101 GACACTGGAA GTGCCAGGAG GAAGGAAGGC TGGTTTGGTG GGGTAGTGGG 
1151 GAGGTCAGGG AGGCGGGGCC AAGGGTGTCC CACATTCCCA ACACCGCCCT 
1201 CTGATCACCA TGGGAATCTT TGGACTCAGG ACAGGGCCAG GCGCAGGGCT 
1251 CTCCCTCCTC TCCCCTTCGC TGTCCCCTCC CCCTGGAGGG CATGGTGTCG 
1301 GGGGGTGGCA CTGAGCTATG AGTCCCGGGG ATGGTGAGGA ACGCCACAGA 
1351 CAGAGCCACC CTAGGAGTGA GTATAGTGCT GGTGACTGTG TTTCATAGCC 
1401 CCAGTCCAGG GCTGTCTAAG AAATAAAGAT CATCAGACTC CAAAAAAAAA 
1451 AAAAAAAAAA AC 



BLAST Results 



Entry HS604351 from database EMBL: 
human STS WI-18474. 
Score = 1178, P = 1.5e-48, identities = 250/268 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 87 bp to 947 bp; peptide length: 287 
Category: similarity to unknown protein 
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1 MATVMAATAA ERAVLEEEFR WLLHDEVHAV LKQLQDILKE ASLRFTLPGS 
51 GTEGPAKQEN FILGSCGTDQ VKGVLTLQGD ALSQADVNLK MPRNNQLLHF 
101 AFREDKQWKL QQIQDARNHV SQAIYLLTSR DQSYQFKTGA EVLKLMDAVM 
151 LQLTRARNRL TTPATLTLPE IAASGLTRMF APALPSDLLV NVYINLNKLC 
201 LTVYQLHALQ PNSTKNFRPA GGAVLHSPGA MFEWGSQRLE VSHVHKVECV 
251 IPWLNDALVY FTVSLQLCQQ LKDKISVFSS YWSYRPF 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_72ml6, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_72ml6, frame 3 



Report for DKFZphfbr2_72ml6.3 



[LENGTH] 287 

[MW] 32254.40 

[plj 8.30 

[HOMOL] TREMBL:AF025459_2 gene: "H14A12.3"; Caenorhabditis elegans cosmid H14A12. 3e-14 

[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 6 

[PROSITE] PKC_PHOSPHO_SITE 5 

[PROSITE] ASN_GL YCOS YLAT I ON 1 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 6,27 % 



SEQ MATVMAATAAERAVLEEEFRWLLHDEVHAVLKQLQDILKEASLRFTLPGSGTEGPAKQEN 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccchhhh 

SEQ FILGSCGTDQVKGVLTLQGDALSQADVNLKMPRNNQLLHFAFREDKQWKLQQIQDARNHV 

SEG 

PRD hhccccccceeeeeeeeccccchhhhhhhcccccchhhhhhhhhchhhhhhhhhhhhchh 

SEQ SQAIYLLTSRDQSYQFKTGAEVLKLMDAVMLQLTRARNRLTTPATLTLPEIAASGLTRMF 

SEG 

PRD hhhhhhhhccccceeecchhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccc 

SEQ APALPSDLLVNVYINLNKLCLTVYQLHALQPNSTKNFRPAGGAVLHSPGAMFEWGSQRLE 

SEG 

PRD cccccccceeeeehhhhhhhhhhheeeecccccccccccccceeecccccccccccccee 

SEQ VSHVHKVECVIPWLNDALVYFTVSLQLCQQLKDKISVFSS YWSYRPF 

SEG 

PRD eeeeeeeeeeeecccceeeeeeehhhhhhhhhhhhheeeeeeeeccc 



Prosite for DKFZphfbr2_72ml6. 3 



PS00001 


212->216 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00005 


42->45 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


128->131 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


213->216 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


236->239 


PKC" 


PHOSPHO SITE 


PDOC00005 


PS00005 


283->286 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00006 


8->12 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


50->54 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


83->87 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


128->132 


CK2" 


PHOSPHO SITE 


PDOC00006 


PS00006 


138->142 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


167->171 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00008 


64->70 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphfbr2_72ml6. 3) 
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DKFZphfbr2_72nl2 



group: brain derived 

DKFZphfbr2_72nl2 encodes a novel 117 amino acid protein with similarity to a protein with 
conserved sequence in bacteria and eukariota. 

The novel protein is very similar to human MM46, human and rat gangliosiode expression factor- 
2 (GEF2), C. elegans 14.8 kD protein C32D5.9 and Laccaria bicolor symbiosis-related protein 
LBU93506_1. The function of this highly conserved proteins is not known. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

strong similarity to rat GANGLIOSIDE EXPRESSION FACTOR 2 (GEF-2) 
complete cDNA, complete cds, EST hits 
Sequenced by LMU 
Locus : /map= ,, 12'* 
Insert length: 1880 bp 

Poly A stretch at pos. 1859, poly adenylat ion signal at pos. 1830 

1 GGGGGCCGGT ATTTCTCCAT CTGGCTCTCC TCTACCTCCA GGCAGGCTCA 

51 CCCGAGATCC CCGCCCCGAA CCCCCCCTGC ACACTCGGCC CAGCGCTGTT 

101 GCCCCCGGAG CGGACGTTTC TGCAGCTATT CTGAGCACAC CTTGACGTCG 

151 GCTGAGGGAG CGGGACAGGG TCAGCGGCGA AGGAGGCAGG CCCCGCGCGG 

201 GGATCTCGGA AGCCCTGCGG TGCATCATGA AGTTCCAGTA CAAGGAGGAC 

251 CATCCCTTTG AGTATCGGAA AAAGGAAGGA GAAAAGATCC GGAAGAAATA 

. 301 TCCGGACAGG GTCCCCGTGA TTGTAGAGAA GGCTCCAAAA GCCAGGGTGC 

351 CTGATCTGGA CAAGAGGAAG TACCTAGTGC CCTCTGACCT TACTGTTGGC 

401 CAGTTCTACT TCTTAATCCG GAAGAGAATC CACCTGAGAC CTGAGGACGC 

451 CTTATTCTTC TTTGTCAACA ACACCATCCC TCCCACCAGT GCTACCATGG 

501 GCCAACTGTA TGAGGACAAT CATGAGGAAG ACTATTTTCT GTATGTGGCC 

551 TACAGTGATG AGAGTGTCTA TGGGAAATGA GTGGTTGGAA GCCCAGCAGA 

601 TGGGAGCACC TGGACTTGGG GGTAGGGGAG GGGTGTGTGT GCGCGACATG 

651 GGGAAAGAGG GTGGCTCCCA CCGCAAGGAG ACAGAAGGTG AAGACATCTA 

701 GAAACATTAC ACCACACACA CCGTCATCAC ATTTTCACAT GCTCAATTGA 

751 TATTTTTTGC TGCTTCCTCG GCCCAGGGAG AAAGCATGTC AGGACAGAGC 

801 TGTTGGATTG GCTTTGATAG AGGAATGGGG ATGATGTAAG TTTACAGTAT 

851 TCCTGGGGTT TAATTGTTGT GCAGTTTCAT AGATGGGTCA GGAGGTGGAC 

901 AAGTTGGGGC CAGAGATGAT GGCAGTCCAG CAGCAACTCC CTGTGCTCCC 

951 TTCTCTTTGG GCAGAGATTC TATTTTTGAC ATTTGCACAA GACAGGTAGG 

1001 GAAAGGGGAC TTGTGGTAGT GGACCATACC TGGGGACCAA AAGAGACCCA 

1051 CTGTAATTGA TGCATTGTGG CCCCTGATCT TCCCTGTCTC ACACTTCTTT 

1101 TCTCCCATCC CGGTTGCAAT CTCACTCAGA CATCACAGTA CCACCCCAGG 

1151 GGTGGCAGTA GACAACAACC CAGAAATTTA GACAGGGATC TCTTACCTTT 

1201 GGAAAATAGG GGTTAGGCAT GAAGGTGGTT GTGATTAAGA AGATGGTTTT 

1251 GTTATTAAAT AGCATTAAAC TGGAATTGAC AAGAGTGTTG AGCATCCCTG 

1301 TCTAACCTGC TCTTTCTCTT TGGTGCCCCT TATCTCACCC CTTCCTTGGA 

1351 ATTTAATAAG TCTCAGGCAT TTCCAATTGT AGACTAAAAC CACTCTTAGC 

1401 ATCTCCTCTA GTATTTTCCA TGTATCAGGA AAGAGGTGTC TTATGTAGGG 

1451 AGGGGGCAAG TATGAAGTAA GGTAATTATA TACTACTCTC ATTCAGGATT 

1501 CTTGCTCCCA TGCTGCTGTC CCTTCAGGCT CACATGCACA GGAATGCTAC 

1551 ATGATGGCCA GCTGCTTCCC TCCTTGGTTA TCATCCACTG CAGCTGCTAG 

1601 TTAGAAAGGT TTGGAGGGAT GACTTTTAGT AAATCATGGG GATTTTATTG 

1651 ATTTATTTTC ACTTTTGGGA TTTTGTGGGG TGGGAGTGGG GAGCAGGAAT 

1701 TGCACTCAGA CATGACATTT CAATTCATCT CTGCTAATGA AAAGGGTTCT 

1751 TTCTCTTGGG GGAAATGTGT GTGTCAGTTC TGTCAGCTGC AAGTTCTTGT 

1801 ATAATGAAGT CAATGCCATC AGGCCAAGGA AATAAAATAA TTGCTTACCT 

1851 TAAAAATCGA AAAAAAAAAA AAAAAAAAAC 



BLAST Results 



Entry HS418210 from database EMBL : 
human STS SHGC-10496. 
Score « 1916, P « 4.0e-8O, identities - 394/400 

Entry AC006514 from database EMBL NEW : 

*** SEQUENCING IN PROGRESS *** Homo sapiens; HTGS phase 1, 68 unordered 
pieces. 

Score = 610, P - 2.7e-16, identities 128/134 
4 exons 
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No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 227 bp to 577 bp; peptide length: 117 
Category: strong similarity to known protein 



1 MKFQYKEDHP FEYRKKEGEK IRKKYPDRVP VIVEKAPKAR VPDLDKRKYL 
51 VPSDLTVGQF YFLIRKRIHL RPEDALFFFV NNTIPPTSAT MGQLYEDNHE 
101 EDYFLYVAYS DESVYGK 

BLAST P hits 

Entry YQD9_CAEEL from database SWISSPROT: 

HYPOTHETICAL 14.8 KD PROTEIN C32D5.9 IN CHROMOSOME II. 

Score = 496, P = l.Be-47, identities » 91/116, positives -» 105/116 

Entry SYRP_LACBI from database SWISSPROT: 
SYMBIOSIS -RELATED PROTEIN. 

Score = 390, P = 3.1e-36, identities = 68/117, positives - 94/117 
Entry LBU93506_1 from database TREMBL: 

product: "symbiosis-related protein"; Laccaria bicolor 
symbiosis-related protein mRNA, partial cds . 

Score = 390, P = 3.1e-36, identities = 68/117, positives - 94/117 

Entry GEF2_RAT from database SWISSPROT: 
GANGLIOSIDE EXPRESSION FACTOR 2 (GEF-2 ) . 

Score » 373, P = 2.0e-34, identities « 71/116, positives => 88/116 



Alert BLAST P hits for DKFZphfbr2_72nl2, frame 2 

TREMBLNEW:AF044671_1 product: M MM4 6" ; Homo sapiens MM4 6 mRNA, complete 
cds., N = 1, Score = 549, P - 4.7e-53 

SWISSPROT :GEF2_HUMAN GANGLIOSIDE EXPRESSION FACTOR 2 (GEF-2)., N = 1, 
Score « 373, P - 2.1e-34 

>TREMBLNEW:AF044 671_1 product: "MM4 6" ; Homo sapiens MM46 mRNA, complete 
cds. 

Length =117 

HSPs: 

Score - 549 (82.4 bits), Expect - 4.7e-53, P - 4.7e-53 
Identities = 101/116 (87%), Positives - 110/116 (94%) 

Query: 1 MKFQYKEDHPFEYRKKEGEKIRKKYPDRVPVIVEKAPKARVPDLDKRKYLVPSDLTVGQF 60 

MKF YKE+HPFE R+ EGEKIRKKYPDRVPVIVEKAPKAR+ DLDK+KYLVPSDLTVGQF 
Sbjct: 1 MKFVYKEEHPFEKRRSEGEKIRKKYPDRVPVIVEKAPKARIGDLDKKKYLVPSDLTVGQF 60 

Query: 61 YFLIRKRIHLRPEDALFFFVNNTIPPTSATMGQLYEDNHEEDYFLYVAYSDESVYG 116 

YFLIRKRIHLR EDALFFFVNN IPPTSATMGQLY+++HEED+FLY+AYSDESVYG 
Sbjct: 61 YFLIRKRIHLRAEDALFFFVNNVIPPTSATMGQLYQEHHEEDFFLYIAYSDESVYG 116 

Pedant information for DKFZphfbr2_72nl2 , frame 2 

Report for DKFZphfbr2_72nl2 .2 

[LENGTH) 117 

[MW] 14044.07 

[pi] 8.67 

[HOMOL] TREMBL:AF04 4671_1 product: "MM46"; Homo sapiens MM46 mRNA, complete cds. le-56 
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[FUNCATJ 30.03 organization of cytoplasm [S. cerevisiae, YBL078c] 4e-36 

[ FUNCAT) 08.22 cytoskeleton-dependent transport (S. cerevisiae, YBL078c) 4e-36 

[FUNCATJ 06.13.04 lysosomal and vacuolar degradation [S. cerevisiae, YBL078c) 4e-36 

ISUPFAM] hypothetical protein YBL078C 8e-35 

[PROSITE] ASN_GLYCOSYLATI0N 1 

[KW] Alpha_Beta 



SEQ MKFQYKEDHPFEYRKKEGEKIRKKYPDRVPVIVEKAPKARVPDLDKRKYLVPSDLTVGQF 

PRD cccccccccchhhhhhhhhhhhhhccccceeeeeccccccccccccceeecccccchhhh 

SEQ YFLI RKRIHLRPEDALFFFVNNTI PPTSATMGQLYEDNHEEDYFLYVAYSDESVYGK 

PRD hhhhhhhhhhccccceeeeecccccccchhhhhhhhhccccceeeeeeecccccccc 



Prosite for DKF2phfbr2_72nl2.2 
PS00001 81->85 ASN GLYCOSYLATION PDOC00001 



(No Pfam data available for DKFZphfbr2_72nl2 . 2) 
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DKFZphfbr2_78c24 



group: signal transduction 

DKFZphfbr2_78c24 encodes a novel 563 amino acid protein with strong similarity to guanylate- 
binding proteins (GBPs) . 

GBPs were originally described as proteins that are strongly induced by interferons and are 
capable of binding to agarose-immobilized guanine nucleotides. hGBPl, the first of two members 
of this protein family in humans, represents a novel type of GTPase. The novel protein 
contains an ATP/GTP-binding site motif A (P-loop) and a RGD cell attachment site. It seems to 
be a new member of the GBP-family and shows a splicing pattern not described previously. 

The new protein can find application in modulating/blocking the response of cells to 
interferons . 



strong similarity to guanine nucleotide-binding protein 1/2 
but different "splice variant" aa 211-245 of GBP1/2 missing 



Sequenced by MediGenomix 



Locus : unknown 



Insert length: 2952 bp 

Poly A stretch at pos. 2927, polyadenylation signal at pos. 2914 



1 CAGTTTCATT AGGCTCTGAA 
51 ATTTGATCAC TGAGGAAAAT 
101 AATAAAAGTC CAGCGATCCA 
151 TTTACCTGGA CTGAAGATAA 
201 ATGGCTCCAG AGATCCACAT 
251 TAATGGGGAA CTGGTGGCGA 
301 TTACACAGCC TGTGGTGGTG 
351 AAATCCTACC TGATGAACAA 
401 GGGCTCCACA GTGAAATCTC 
451 CTCACCCCAA AAAGCCAGAA 
501 CTGGGAGATG TAAAGAAGGG 
551 CCTGGCCGTC CTCCTGAGCA 
601 TCAACCAGCA GGCTATGGAC 
651 CGAATCCGAT CAAAATCCTC 
701 TGACTTTGTG AGCTTCTTCC 
751 CCCTGGACTT GGAAGCAGAT 
801 GAGTATTCCC TGAAGCTAAC 
851 GAAACTACAA GATGAAGAGC 
901 ACTTCTGTTC CTACATCTTT 
951 GGCATCAAGG TCAATGGGCC 
1001 CAATGCTATC AGCAGAGGGG 
1051 CCTTGGCCCA GATAGAGAAC 
1101 TATGACCAGC AGATGGGCCA 
1151 GGAGCTGCTG GACCTGCACA 
1201 ATATGAAGAA CTCTTTCAAG 
1251 GCGGCCCAGC TAGACAAAAA 
1301 AGCATCATCA GATCGTTGCT 
1351 TAGAAGAAGA AGTGAAGGCG 
1401 CTCTTTATTC AGAAGCTACA 
14 51 AAGGAAGGGG ATACAGGCTG 
1501 AGGAGTCTGT GACCGATGCA 
1551 AAGGAAAAGG AGATTGAAGT 
1601 TTCAGCAAAA ATGGTGGAGG 
1651 AAGAGAAAGA GAAGAGTTAT 
1701 ATGGAGAGGG AGAGGGCCCA 
1751 TAGTAAACTT CAGGAACAGG 
1801 AAAGTACCCA ACTTCAAAAT 
1851 AAAAAAACCA AGAGATATAT 
1901 GCTTTTCTGT CATCCTAACC 
1951 GGAACAAGTG TC ACT AT ATT 
2001 TAAAAGTTTA CAAGAACATG 
2051 TTAAAAAGAT TGTAAATTGT 
2101 CAGAGGAGGG ATCATGAGTT 
2151 GACCAGTGGA TACTGAGGAA 
2201 GGGCACTGGT TTGGCCAAGT 
2251 ATCCTAGCTT CCTAGGGAAG 
2301 TACAAGGTCT ATGAGCAATA 
2351 TTCTCACTGA TGGATCTCAA 
2401 AGAATCTTAT ATTTTCCATA 
24 51 GAATTGAATC ATAAACAAAT 
2501 TCAATTCATC TAGATTATAA 



GCCATTACAA AGGTTGCTTA ACTTCTAATT 
CCAGAAAGCT ACACAACACT GAAGGGGTGA 
GCGAAAGAAA AGAGAAGTGA CAGAAACAAC 
AAGCACAGAC AAGAGAACAA TGCCCTGGAC 
GACAGGCCCA ATGTGCCTCA TTGAGAACAC 
ATCCAGAAGC TCTGAAAATC CTGTCTGCCA 
GTGGCAATTG TGGGCCTCTA CCGCACAGGA 
GCTAGCTGGG AAGAATAAGG GCTTCTCTCT 
ACACCAAAGG AATCTGGATG TGGTGTGTGC 
CACACCTTAG TCCTGCTTGA CACTGAGGGC 
TGACAACCAG AATGACTCCT GGATCTTCAC 
GCACTCTCGT GTACAATAGC ATGGGAACCA 
CAACTGTACT ATGTGACAGA GCTGACACAT 
ACCTGATGAG AATGAGAATG AGGATTCAGC 
CAGATTTTGT GTGGACACTG AGAGATTTCT 
GGACAACCCC TCACACCAGA TGAGTACCTG 
GCAAGGTAAC AGGAAGCTTG CCCAGCTTGA 
TGGACCCTGA ATTTGTGCAA CAAGTAGCAG 
AGCAATTCCA AAACTAAAAC TCTTTCAGGA 
TTGTCTAGAG AGCCTAGTGC TGACCTATAT 
ATCTGCCCTG CATGGAGAAC GCAGTCCTGG 
TCAGCCGCAG TGCAAAAGGC TATTGCCCAC 
GAAGGTGCAG CTGCCCGCAG AAACCCTCCA 
GGGTTAGTGA GAGGGAGGCC ACTGAAGTCT 
GATGTGGACC ATCTGTTTCA AAAGAAATTA 
GCGGGATGAC TTTTGTAAAC AGAATCAAGA 
CAGCTTTACT TCAGGTCATT TTCAGTCCTC 
GGAATTTATT CGAAACCAGG GGGCTATTGT 
AGACCTGGAG AAAAAGTACT ATGAGGAACC 
AAGAGATTCT GCAGACATAC TTGAAATCCA 
ATTCTACAGA CAGACCAGAT TCTCACAGAA 
GGAATGTGTA AAAGCTGAAT CTGCACAGGC 
AAATGCAAAT AAAGTATCAG CAGATGATGG 
CAAGAACATG TGAAACAATT GACTGAGAAG 
GTTGCTGGAA GAGCAAGAGA AGACCCTCAC 
CCCGAGTACT AAAGGAGAGA TGCCAAGGTG 
GAGATACAAA AGCTACAGAA GACCCTGAAA 
GTCGCATAAG CTAAAGATCT AAACAACAGA 
CAAGGCATAA CTGAAACAAT TTTAGAATTT 
TGATAATAAT TAGATCTTGC ATCATAACAC 
CAGTTCAATG ATCAAAATCA TGTTTTTTCC 
GCAACAAAGA TGCATTTACC TCTGTACCAA 
GCCACCACTC AGAAGTTTAT TCTTCCAGAC 
AGTCTTAGGT AAAAATCTTG GGACATATTT 
GTACAATAGG TCCCAATATC AGAAACAACC 
ACAGTGTACA GTTCTCCATT ATATCAAGGC 
ATGTGATTTC TGGACATTGC CCATGGATAA 
GCTAAAGCAA ACCATCTTAT ACAGAGATCT 
GGAAGGTAAA GAAATCATTA GCAAGAGTAG 
TGGCTAATGA AGAAATCTTT TCTTTCTTGT 
CCTTAATGTG ACACCTGAGA CCTTTAGACA 
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2551 GTTGACCCTG AATTAAATAG TCACATGGTA ACAATTATGC ACTGTGTAAT 

2601 TTTAGTAATG TATAACATGC AATGATGCAC TTTAACTGAA GATAGAGACT 

2651 ATGTTAGAAA ATTGAACTAA TTTAATTATT TGATTGTTTT AATCCTAAAG 

2701 CATAAGTTAG TCTTTTCCTG ATTCTTAAAG GTCATACTTG AAATCCTGCC 

2751 AATTTTCCCC AAAGGGAATA TGGAATTTTT TTTGACTTTC TTTTGAGCAA 

2801 TAAAATAATT GTCTTGCCAT TACTTAGTAT ATGTAGACTT CATCCCAATT 

2851 GTCAAACATC CTAGGTAAGT GGTTGACATT TCTTACAGCA ATTACAGATT 

2901 ATTTTTGAAC TAGAAATAAA CTAAACTAGA AACAAAAAAA AAAAAAAAAA 

2951 AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 201 bp to 1889 bp; peptide length: 563 
Category: strong similarity to known protein 
Classification: Cell signaling/communication 
Prosite motifs: RGD (272-275) 
ATP_GTP_A (45-53) 



1 MAPEIHMTGP MCLIENTNGE LVANPEALKI LSAITQPVVV VAIVGLYRTG 

51 KSYLMNKLAG KNKGFSLGST VKSHTKGIWM WCVPHPKKPE HTLVLLDTEG 

101 LGDVKKGDNQ NDSWIFTLAV LLSSTLVYNS MGTINQQAMD QLYYVTELTH 

151 RIRSKSSPDE NENEDSADFV SFFPDFVWTL RDFSLDLEAD GQPLTPDEYL 

201 EYSLKLTQGN RKLAQLEKLQ DEELDPEFVQ QVADFCSYIF SNSKTKTLSG 

251 GIKVNGPCLE SLVLTYINAI SRGDLPCMEN AVLALAQIEN SAAVQKAIAH 

301 YDQQMGQKVQ LPAETLQELL DLHRVSEREA TEVYMKNSFK DVDHLFQKKL 

351 AAQLDKKRDD FCKQNQEASS DRCSALLQVI FSPLEEEVKA GIYSKPGGYC 

401 LFIQKLQDLE KKYYEEPRKG IQAEEILQTY LKSKESVTDA ILQTDQILTE 

451 KEKEIEVECV KAESAQASAK MVEEMQIKYQ QMMEEKEKSY QEHVKQLTEK 

501 MERERAQLLE EQEKTLTSKL QEQARVLKER CQGESTQLQN EIQKLQKTLK 

551 KKTKRYMSHK LKI 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_78c24 , frame 3 

PIR:A41268 guanine nucleotide-binding protein 1 - human, N = 2, Score = 
1306, P - 4.9e-238 

PIR:A46459 macrophage-activation gene-1 protein mag-1 - mouse, N - 2, 
Score = 942, P = 8.9e-184 

PIR:S70524 guanine nucleotide-binding protein 2 - human, N = 2, Score = 
1131, P * 4.1e-210 

TREMBL:AF077007_1 gene: "Gbp2"; product: "interferon-induced guanylate 
binding protein~~GBP-2"; Mus musculus interferon-induced guanylate 
binding protein GBP-2 (Gbp2) mRNA, complete cds., N = 2, Score = 904, p 
= 1.2e-179 



>PIR:A41268 guanine nucleotide-binding protein 1 - human 
Length - 592 

HSPs: 

Score = 1306 (195.9 bits), Expect - 4.9e-238, Sum P(2) «■ 4.9e-238 
Identities = 264/332 (79%), Positives = 288/332 (86%) 

Query: 211 RKLAQLEKLQDEELDPEFVQQVADFCSYIFSNSKTKTLSGGIKVNGPCLESLVLTYINAI 270 

RKLAQLEKLQDEELDPEFVQQVADFCS YI FSNSKTKTLSGGI +VNGP LESLVLTY+NAI 
Sbjct: 245 RKLAQLEKLQDEELDPEFVQQVADFCS YIFSNSKTKTLSGGIQVNGPRLESLVLTYVNAI 304 
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Query: 271 SRGDLPCMENAVLALAQIENSAAVQKAIAHYDQQMGQKVQLPAETLQELLDLHRVSEREA 330 

S GDLPCMENAVLALAQIENSAAVQKAIAHY+QQMGQKVQLP E+LQELLDLHR SERE A 
Sbjct: 305 SSGDLPCMENAVLALAQIENSAAVQKAIAHYEQQMGQKVQLPTESLQELLDLHRDSEREA 364 

Query: 331 TEVYMKNSFKDVDHLFQKKLAAQLDKKRDDFCKQNQEASSDRCS ALLQVI FS PLEEEVKA 390 

EV++++SFKDVDHLFQK+LAAQL+KKRDDFCKQNQEASSDRCS LLQV I FS PLEEEVKA 
Sbjct: 365 IEVFIRSSFKDVDHLFQKELAAQLEKKRDDFCKQNQEASSDRCSGLLQVIFS PLEEEVKA 424 

Query: 391 GIYSKPGGYCLFIQKLQDLEKKYYEEPRKGIQAEEILQTYLKSKESVTDAILQTDQILTX 4 50 

GIYSKPGGY LF+QKLQDL+KKYYEEPRKGIQAEEILQTYLKSKES+TDAILQTDQ LT 
Sbjct: 425 GIYSKPGGYRLFVQKLQDLKKKYYEEPRKGIQAEEILQTYLKSKESMTDAILQTDQTLTE 484 

Query: 451 XXXXXXXXXXXXXSAQASAKMVEEMQIKYQQMMEEKEKSYQEHVKQLTEKMXXXXXXXXX 510 

SAQASAKM++EMQ K + QMME + KE + S YQEH + KQLT E KM 
Sbjct: 485 KEKEIEVERVKAESAQASAKMLQEMQRKNEQMMEQKERSYQEHLKQLTEKMENDRVQLLK 544 

Query: 511 XXXKTLTSKLQEQARVLKERCQGESTQLQNEI 542 

+TL KLQEQ ++LKE Q ES ++NEI 
Sbjct: 545 EQERTLALKLQEQEQLLKEGFQKESRIMKNEI 576 

Score = 1012 {151.8 bits), Expect = 4.9e-238, Sum P(2) = 4.9e-238 
Identities = 194/211 (91%), Positives = 200/211 (94%) 

Query; 1 MAPEIHMTGPMCLIENTNGELVANPEALKILSAITQPVVVVAIVGLYRTGKSYLMNKLAG 60 

MA EIHMTGPMCLI ENTNG L+ANPEALKILSAITQP+VVVAIVGLYRTGKSYLMNKLAG 
Sbjct: 1 MASEIHMTGPMCLIENTNGRLMANPEALKILSAITQPMVVVAIVGLYRTGKSYLMNKLAG 60 

Query: 61 KNKGFSLGSTVKSHTKGIWMWCVPHPKKPEHTLVLLDTEGLGDVKKGDNQNDSWIFTLAV 120 

K KGFSLGSTV+SHTKGIWMWCVPHPKKP H LVLLDTEGLGDV+KGDNQNDSWIF LAV 
Sbjct: 61 KKKGFSLGSTVQSHTKGIWMWCVPHPKKPGHILVLLDTEGLGDVEKGDNQNDSWIFALAV 120 

Query: 121 LLSSTLVYNSMGTINQQAMDQLYYVTELTHRIRSKSSPDENENE — DSADFVSFFPDFVW 178 

LLSST VYNS+GTINQQAMDQLYYVTELTHRIRSKSSPDENENE DSADFVSFFPDFVW 
Sbjct: 121 LLSSTFVYNSIGTINQQAMDQLYYVTELTHRIRSKSSPDENENEVEDSADFVSFFPDFVW 180 

Query: 179 TLRDFSLDLEADGQPLTPDEYLEYSLKLTQG 209 

TLRDFSLDLEADGQPLTPDEYL YSLKL +G 
Sbjct: 181 TLRDFSLDLEADGQPLTPDEYLTYSLKLKKG 211 

Pedant information for DKFZphf br2_7 8c2 4 , frame 3 



Report for DKFZphf br2_78c24 .3 

[LENGTH] 563 

[MW] 64127.72 

[pi] 5.45 

[HOMOL] PIR:A41268 guanine nucleotide-binding protein 1 - human 0.0 

[SUPFAM] guanine nucleotide-binding protein 1 0.0 

[ PROS IT E] ATP_GTP_A 1 

[PROSITE] RGD 1 

[KW] TRANSMEMBRANE 1 

[KW] LOW COMPLEXITY 6.75 % 

[KW] COILED_COIL 10.48 % 

SEQ MAPEIHMTGPMCLIENTNGELVANPEALKILSAITQPVVVVAIVGLYRTGKSYLMNKLAG 

SEG 

PRD cccccccccceeeeeccccchhhhhhhhhhhhhhhcceeeeeeeecccccchhhhhhhhh 

COILS 

MEM . MMMMMMMMMMMMMMMMM 

SEQ KNKGFSLGSTVKSHTKGIWMWCVPHPKKPEHTLVLLDTEGLGDVKKGDNQNDSWI FTLAV 

SEG 

PRD cccccccccccccccceeeeeecccccccceeeeeeeccccccccccccccchhhhhhhh 

COILS 

MEM 

SEQ LLSSTLVYNSMGTINQQAMDQLYYVTELTHRI RSKSSPDENENEDSADFVSFFPDFVWTL 

SEG 

PRD hhhhheeeccccchhhhhhhhhhhhhhhhhhhhhcccccccccccccceeeeccceeeeh 

COILS 

MEM 

SEQ RDFSLDLEADGQPLTPDEYLEYSLKLTQGNRKLAQLEKLQDEELDPEFVQQVADFCSYIF 

SEG 

PRD hhhhhhhhccccccccchhhhhhhhhhccchhhhhhhhhhhhhcccchhhhhhhhhhhhc 

COILS 
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MEM 

SEQ SNSKTKTLSGGIKVNGPCLESLVLTYINAISRGDLPCMENAVLALAQIENSAAVQKAIAH 

SEG 

PRD cccceeeccccccccccchhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhh 

COILS 

MEM 

SEQ YDQQMGQKVQLPAETLQELLDLHRVSEREATEVYMKNSFKDVDHLFQKKLAAQLDKKRDD 

SEG 

PRD hhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhh 

COILS 

MEM 

SEQ FCKQNQEASSDRCSALLQVIFSPLEEEVKAGIYSKPGGYCLFIQKLQDLEKKYYEEPRKG 

SEG 

PRD hhhhhhchhhhhhhhhhhhhhhhhhhhhhcccccccccceeehhhhhhhhhhhhhccccc 

COILS 

MEM 

SEQ IQAEEILQTYLKSKESVTDAILQTDQILTEKEKEIEVECVKAESAQASAKMVEEMQIKYQ 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

MEM 



SEQ QMMEEKEKSYQEHVKQLTEKMERERAQLLEEQEKTLTSKLQEQARVLKERCQGESTQLQN 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCC 

MEM 



SEQ EIQKLQKTLKKKTKRYMSHKLKI 

SEG ..xxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhccc 

COILS CCCCCCC 

MEM 



Prosite for DKFZphf br2_78c2 4 . 3 

PS00016 272->275 RGD PDOC00016 
PS00017 45->53 ATP GTP A PDOC00017 



(No Pfam data available for DKFZphf br2_78c2 4 . 3) 
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DKFZphfbr2_78dl3 

group: brain derived 

DKFZphfbr2_78dl3 encodes a novel 259 amino acid protein with similarity to C. elegans putative 
protein from cosmid K08B12. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

similarity to C. elegans K08B12.3 
Sequenced by MediGenomix 

Locus: /map=**338.4 cR from top of Chrl8 linkage group" 
Insert length: 2195 bp 

Poly A stretch at pos. 2175, polyadenylation signal at pos . 2156 

1 CGTCCGTCGG GCAGCAGCGG GGCTGTCTAT CCCGGCTGAG GACCCGCGGC 
51 CAGTGCGGGT GGCTGGCTTT GCCATTAGCG GGGGCCTTTC CTGAGGACGG 

101 CGTACGGAGT GTGGGGAATG AAGGATGGCA GCATGCCGTG CATTAAAAGC 

151 TGTTTTGGTA GATCTCAGTG GCACACTTCA CATTGAAGAT GCAGCTGTGC 

201 CAGGCGCACA GGAAGCTCTT AAAAGGTTAC GTGGTGCTTC TGTAATCATT 

251 AGGTTTGTGA CCAATACAAC CAAAGAGAGC AAGCAAGACC TGTTAGAAAG 

301 GTTGAGAAAA TTGGAATTTG ATATCTCTGA AGATGAAATA TTCACATCTC 

351 TGACTGCAGC CAGAAGTTTA CTAGAGCGGA AACAAGTCAG ACCCATGCTG 

401 CTAGTTGATG ATCGGGCACT ACCTGATTTC AAAGGAATAC AAACAAGTGA 

451 TCCTAATGCT GTGGTCATGG GATTGGCACC AGAACATTTT CATTATCAAA 

501 TTCTGAATCA AGCATTCCGG TTACTCCTGG ATGGAGCACC TCTGATAGCA 

551 ATCCACAAAG CCAGGTATTA CAAGAGGAAA GATGGCTTAG CCCTGGGGCC 

601 TGGACCATTT GTGACTGCTT TAGAGTATGC CACAGATACC AAAGCCACAG 

651 TCGTGGGGAA ACCAGAGAAG ACGTTCTTTT TGGAAGCATT GCGGGGCACT 

701 GGCTGTGAAC CTGAGGAGGC TGTCATGATA GGAGATGATT GCAGGGATGA 

751 TGTTGGTGGG GCTCAAGATG TCGGCATGCT GGGCATCTTA GTAAAGACTG 

801 GGAAATATCG AGCATCAGAT GAAGAAAAAA TTAATCCACC TCCTTACTTA 

851 ACTTGTGAGA GTTTCCCTCA TGCTGTGGAC CACATTCTGC AGCACCTATT 

901 GTGAAGCAAT GTGTGCATCT GAAGCAACTT GAAATGCAGC TTCTTATTGT 

951 CTGGAATGAA TCCCTTACCA ACTCAGTGCC AGCATCGGTA GACACCAGTC 
1001 AGTGCTGATC GCTTTTTAAC CCTCTTTTGT TGTGCATTAA TTAGAAAGAA 
1051 AGGTATTGAA TTGCGGCTAG CCAGTAAGCC TTGCTAATCT CTTTTATTTT 
1101 GTAACTGAAG ATGAGACCCA AAGAAAGGGA AAGCTGAGAT TTTGTGCCAT 
1151 TCCTTTTAAA ATATTCATCA GGTTAGGTGG GGCTGTGGGG GAAAAGCTAC 
1201 TACAGGGAAG AGTGTTCTCT GCTGTCTCTT CACTGGAAAA CAGGGAGGGG 
1251 GGATTTCAGA CTGTGAAGAA AGTTGAATGG TGGTTTTTAA ATTATAAAGT 
1301 AATGTATTAA AAGGTGCATT AGGCTGTAGT TCTAATATTG AGTTCAACTG 
1351 TGAAATCCAT CAGATGTGCC AAATGGAGAA GACAGAAAGC AACAAAGTGA 
14 01 ATTGTTCTTT AGCCCAAGTG GTACAGTGAA TTTGCTTTAA CAGATGTTGA 
14 51 AAACTAAATT TTCTACTGTA TTCCCAGCAC GGGTGACTTC TTTTTCTCTT 
1501 CATTAGCCAG AGATGACTAA TTTAAATTTA GAACCAGATT TTAATTTAAA 
1551 TTAATATTTC CATTAATAAC CTACTCATTG CAGATACCTA TTATACTGTG 
1601 TAACAGTTGT TTTGGAAATT TTATGTAAAA TTAAAACTAT CAGTATTTTA 
1651 CAGATGTTTT AATTAGACAT TGTTATTAAC AGGAACAGTG CAGAAACTAG 
1701 AATCAAGCCT TATAATATCT TATAGACCAT GCATTTTTGA AGTTAGTGTC 
1751 CACTAGGGTC CTATTAACTG TACATTTGCA AGATTTCATT ATTTTTGCCT 
1801 CTGACACTAT GGGAAAAATT TTTTAGAAGC TATTGGGACA GATTCAAGCT 
1851 TTTATGCACT TGGTTACTAC AGCTGTAAAA TGAAATCTCG TCTTGTAGCA 
1901 TGGATTATTC TTCTCATGTT AAACCCACCA AAATAAAGGG GACTAAATAG 
1951 GTAATGATTT TCCTAGTGCA TTTGCATACT GTGATAATCC TGGGCCTTGC 
2001 AATAGTTCTA CAGGGCTCTT GGGCATTGAA TTATTAGGAT GTAATTGTAC 
2051 ATCATTGTAG TGTTCACCTT ATTGAAGCTC ACTCTGATGT TAATGAGCTT 
2101 CGGGTTTTGA TGCTTGTTTA GAGATCAGCA GTCTTGGATG GGAGGGAACA 
2151 AAGCTAAATA AATGTTAGTT TGGTGAAAAA AAAAAAAAAA AAAAA 

BLAST Results 



Entry HS599355 from database EMBL: 
human STS WI-13484. 
Score - 1262, P « 3.6e-52, identities « 274/289 



Medline entries 
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No Medline entry 



Peptide information for frame 2 



ORF from 125 bp to 901 bp; peptide length: 259 
Category: similarity to unknown protein 
Classification: no clue 



1 MAACRALKAV LVDLSGTLHI EDAAVPGAQE ALKRLRGASV IIRFVTNTTK 

51 ESKQDLLERL RKLEFDISED EIFTSLTAAR SLLERKQVRP MLLVDDRALP 

101 DFKGIQTSDP NAWMGLAPE HFHYQILNQA FRLLLDGAPL IAIHKARYYK 

151 RKDGLALGPG PFVTALEYAT DTKATVVGKP EKTFFLEALR GTGCEPEEAV 

201 MIGDDCRDDV GGAQDVGMLG ILVKTGKYRA SDEEKINPPP YLTCESFPHA 

251 VDHILQHLL 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphfbr2_78dl3, frame 2 

TREMBL:CEUK08B12_1 gene: "K08B12 . 3"; Caenorhabditis elegans cosmid 
K08B12., N = 1, Score - 609, P = 2.2e-59 

TREMBL:CEC13C4_5 gene: W C13C4.4 M ; Caenorhabditis elegans cosmid C13C4, 
N = 1, Score = 408, P * 4.4e-38 



>TREMBL:CEUK08B12_1 gene: "K08B12.3"; Caenorhabditis elegans cosmid 
K08B12. 

Length = 257 



HSPs: 



Score = 609 (91.4 bits), Expect = 2.2e-59, P = 2.2e-59 
Identities = 132/251 (52%), Positives - 172/251 (68%) 



Query : 


7 


LKAVLVDLSGTLHIEDAAVPGAQEALKRLRGASVIIRFVTNTTKESKQDLLERLRKLEFD 


66 






+ +VL+DLSGT+HIE+ A+PGAQ AL+ LR + + +FVTNTTKESK+ L +RL F 




Sbjct: 


4 


I S S VLI DLSGT I H I EE FA I PGAQTALELLRQHAKV-KFVTNTTKESKRLLHQRLI NCGFK 


62 


Query: 


67 


ISEDEIFTSLTAARSLLERKQVRPMLLVDDRALPDFKGIQTSDPNAVVMGLAPEHFHYQI 


126 






+ ++EI FTSLTAAR L+ + Q RP +VDDRA+ DF+G1 T DPNAVV+GLAPE F+ 




Sbjct: 


63 


VEKEEIFTSLTAARDLIVKNQYRPFFIVDDRAMEDFEGISTDDPNAWIGLAPEKFNDTT 


122 


Query: 


127 


LNQAFRLLLDG-APLIAIHKARYYKRKDGLALGPGPFVTALEYATDTKATVVGKPEKTFF 


185 




L AFRL+ + A LIAI+K RY++ GL LGPG +V LEY+ +AT+VGKP K FF 




Sbjct: 


123 


LTHAFRLIKEKKASLIAINKGRYHQTNAGLCLGPGTYVAGLEYSAGVEATIVGKPNKLFF 


182 


Query: 


186 


LEALRGTG— CEPEEAVMIGDDCRDDVGGAQDVGMLGILVKTGKYRASDEEKINPPPYLT 


243 






AL+ + AVMIGDD DD GA +GM ILVKTGK+R DE K+ 




Sbjct: 


183 


ESALQSLNENVDFSSAVMIGDDVNDDALGAIKIGMRAILVKTGKFRDGDELKVKN V 


238 


Query : 


244 


CESFPHAVDHILQH 257 








SF AV+ I+++ 




Sbjct: 


239 


AN S FV D AVNM 1 1 EN 252 





Pedant information for DKF2phfbr2_78dl3, frame 2 



Report for DKFZphfbr2_78di3 . 2 



[LENGTH) 259 

[MW] 28536.04 

[pi] 5.84 

fHOMOL] TREMBL:CEUK08B12_1 gene: "K08B12.3"; Caenorhabditis elegans cosmid K08B12. 
62 

(FUNCAT) r general function prediction [M. jannaschii, MJ1437] 3e-05 

[SUPFAM) nagD protein 4e-18 

[KW] Alpha_Beta 
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SEQ MAACRALKAVLVDLSGTLHIEDAAVPGAQEALKRLRGASVIIRFVTNTTKESKQDLLERL 

PRD ccccccceeeeeecccceeeecccccchhhhhhhhhhccceeeeeeccccchhhhhhhhh 

SEQ RKLEFDISEDEIFTSLTAARSLLERKQVRPMLLVDDRALPDFKGIQTSDPNAVVMGLAPE 

PRD hhhccccccceeeehhhhhhhhhhhhccceeeeeechhhhhhccccccccceeeeecccc 

SEQ HFHYQILNQAFRLLLDGAPLIAIHKARYYKRKDGLALGPGPFVTALEYATDTKATVVGKP 

PRD chhhhhhhhhhhhhhccceeeeeccccccccccccccccccchhhhhhhhccceeeeccc 

SEQ EKTFFLEALRGTGCEPEEAVMIGDDCRDDVGGAQDVGMLGILVKTGKYRASDEEKINPPP 

PRD cchhhhhhhhhhccccceeeeecccchhhhhhhhhccceeeeeeeccccccccccccccc 

SEQ YLTCESFPHAVDHILQHLL 

PRD cccccchhhhhhhhhhccc 



(No Prosite data available for DKFZphfbr2_78dl3 .2) 
(No Pfam data available for DKFZphfbr2_78dl3.2) 
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DKFZphfbr2_78k24 



group: metabolism 

DKFZphfbr2_78k24 encodes a novel 372 amino acid protein with similarity to Mus musculus 
ubiquitin specific protease UBP43. 

The novel protein contains a Prosite ubiquitin carboxyl-terminal hydrolases family 2 signature 
2. Ubiquitin carboxyl-terminal hydrolases (EC 3.1.2.15) (UCH) (deubiquitinating enzymes) are 
thiol proteases that recognize and hydrolyze the peptide bond at the C-terminal glycine of 
ubiquitin. These enzymes are involved in the processing of poly-ubiquitin precursors as well 
as that of ubiquinated proteins. 

The new protein can find application in modulation of protein stability/degradation in cells. 



Ubiquitin carboxyl-terminal hydrolases family 2 signature 2. 



strong similarity to mouse ubiquitin specific protease UBP43 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 1874 bp 

Poly A stretch at pos. 1852, polyadenylation signal at pos. 1836 



1 AGTCCCGACG TGGAACTCAG 
51 GAGAGATTCC ATCGTGCCTG 
101 CGTGCTGTCC TGAACGCGGG 
151 TGATCACGAA TGAGCAAGGC 
201 CATCCTGGCT GAGTCCTCGC 
251 AAGAAGACAG CAACATGAAG 
301 TGGGACTACC CTCATGGCCT 
351 CTGCCTTAAC TCCTTGATTC 
401 GGATATTGAA GAGGATCACG 
451 AGCGTCCCTT TCCAGATGCT 
501 GCAGAAAGCA GTGCGGCCCC 
551 ACGTGCCCTT GTTTGTCCAA 
601 TGGAACCTGA TTAAGGACCA 
651 GCAGGCCCTG TATACGATCC 
701 GTGCCATGGA GAGTAGCAGA 
751 CTTTTTGATG TGGACTCAAA 
801 CTGCTTCTTC CAGCCCAGGG 
851 AGAACTGTGG GAAGAAGACC 
901 TTGCCCCAGA CCCTGACAAT 
951 ACAGACGAGA AAGATCTGCC 
1001 TCAGCCAGAT CCTTCCAATG 
1051 TCTGGAGGGC AGTATGAGCT 
1101 AGACTCCGGT CATTACTGTG 
1151 GGTTCTGCTT CAATGACTCC 
1201 CAGTGTACCT ACGGAAATCC 
1251 TCTGGTTTAC ATGAAGATGG 
1301 AGATTGACAC GCTGTCATTT 
1351 TCTAAGAGAT TTTGCAATGA 
1401 GAGCCTTATT TATAATTAGG 
1451 CCTCAGGTCC TGATCAGTCA 
1501 ATGTGGCTGC TCGGTCCTGG 
1551 TTAGTTATGA GCCTGTGGGA 
1601 GGCAGTGGGA GGCATCTGGG 
1651 GTATTATACA ACTGCTGTGA 
1701 CTGTTTGTAA TTTTTCACTT 
1751 AGTGTTTTGT AACTGCTATT 
1801 TCTTCTCCAT AAGATAGTGT 
1851 CCACAAAAAA AAAAAAAAAA 



CAGCGGAGGC TGGACGCTTG CATGGCGCTT 
GCTCACATAA GCGCTTCCTG GAAGTGAAGT 
CCAGGCAGCT GCGGCCTGGG GGTTTTGGAG 
GTTTGGGCTC CTGAGGCAAA TCTGTCAGTC 
AGTCCCCGGC AGATCTTGAA GAAAAGAAGG 
AGAGAGCAGC CCAGAGAGCG TCCCAGGGCC 
GGTTGGTTTA CACAACATTG GACAGACCTG 
AGGTGTTCGT AATGAATGTG GACTTCACCA 
GTGCCCAGGG GAGCTGACGA GCAGAGGAGA 
TCTGCTGCTG GAGAAGATGC AGGACAGCCG 
TGGAGCTGGC CTACTGCCTG CAGAAGTGCA 
CATGATGCTG CCCAACTGTA CCTCAAACTC 
GATCACTGAT GTGCACTTGG TGGAGAGACT 
GGGTGAAGGA CTCCTTGATT TGCGTTGACT 
AACAGCAGCA TGCTCACCCT CCCACTTTCT 
GCCCCTGAAG ACACTGGAGG ACGCCCTGCA 
AGTTATCAAG CAAAAGCAAG TGCTTCTGTG 
CGTGGGAAAC AGGTCTTGAA GCTGACCCAT 
CCACCTCATG CGATTCTCCA TCAGGAATTC 
ACTCCCTGTA CTTCCCCCAG AGCTTGGATT 
AAGCGAGAGT CTTGTGATGC TGAGGAGCAG 
TTTTGCTGTG ATTGCGCACG TGGGAATGGC 
TCTACATCCG GAATGCTGTG GATGGAAAAT 
AATATTTGCT TGGTGTCCTG GGAAGACATC 
TAACTACCAC TGGCAGGAAA CTGCATATCT 
AGTGCTAATG GAAATGCCCA AAACCTTCAG 
TCCATTTCCG TTCCTGGATC TACGGAGTCT 
GGAGAAGCAT TGTTTTCAAA CTATATAACT 
GAT AT TAT C A AAATATGTAA CCATGAGGCC 
GAATGGATGC TTTCACCAGC AGACCCGGCC 
GTGCTCGCTG CTGTGCAAGA CATTAGCCCT 
ACTTCAGGGG TTCCCAGTGG GGAGAGCAGT 
GGCCAAAGGT CAGTGGCAGG GGGTATTTCA 
CCAGACTTGT ATACTGGCTG AATATCAGTG 
TGAGAACCAA CATTAATTCC ATATGAATCA 
CATTTATTCA GCAAATATTT ATTGATCATC 
GATAAACACA GTCATGAATA AAGTTATTTT 
AAAA 



BLAST Results 



Entry AC005500 from database EMBL: 
, complete sequence. 

Score « 859, P « 5.7e-143, identities = 175/179 
8 exons matching Bp 317-1230 
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Medline entries 



99182491: 

A novel ubiquitin-specif ic protease, UBP43, cloned from leukemia 
fusion protein AMLl-ETO-expressing mice, functions in 
hematopoietic cell differentiation. 



Peptide information for frame 1 



ORF from 160 bp to 1275 bp; peptide length: 372 
Category: strong similarity to known protein 
Classification: Protein management 
Prosite motifs: UCH 2 2 (302-320) 



1 MSKAFGLLRQ ICQSILAESS QSPADLEEKK EEDSNMKREQ PRERPRAWDY 
51 PHGLVGLHNI GQTCCLNSLI QVFVMNVDFT RILKRITVPR GADEQRRSVP 
101 FQMLLLLEKM QDSRQKAVRP LELAYCLQKC NVPLFVQHDA AQLYLKLWNL 
151 IKDQITDVHL VERLQALYTI RVKDSLICVD CAMESSRNSS MLTLPLSLFD 
201 VDSKPLKTLE DALHCFFQPR ELSSKSKCFC ENCGKKTRGK QVLKLTHLPQ 
251 TLTIHLMRFS IRNSQTRKIC HSLYFPQSLD FSQILPMKRE SCDAEEQSGG 
301 QYELFAVIAH VGMADSGHYC VYIRNAVDGK WFCFNDSNIC LVSWEDIQCT 
351 YGNPNYHWQE TAYLLVYMKM EC 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_78k24, frame 1 

TREMBLNEW:AF069502_1 product: "ubiquitin specific protease UBP43"; Mus 
musculus ubiquitin specific protease UBP4 3 mRNA, complete cds., N = 1, 
Score » 1367, p » le-139 

SWISSPROT:UBPE_DROME UBIQUITIN CARBOXYL-TERMINAL HYDROLASE 64E (EC 
3.1.2.15) (UBIQUITIN THIOLESTERASE 64E) (UBIQUITIN-SPECIFIC PROCESSING 
PROTEASE 64E) (DEUBIQUITINATING ENZYME 64E) . , N = 2, Score » 248, P = 
5.3e-33 



>TREMBLNEW:AF069502_1 product: "ubiquitin specific protease UBP43"; Mus 
musculus ubiquitin specific protease UBP43 mRNA, complete cds. 
Length - 368 

HSPs: 



Score - 1367 (205.1 bits), Expect = 1.0e-139, P - l.Oe-139 
Identities » 262/369 (71%), Positives * 295/369 (79%) 



Query: 


1 


MSKAFGLLRQICQSILAESSQSPADLEEKKEEDSNMKREQPRERPRAWDY PHGLVGLHNI 


60 






M K FGLLR+ CQS++AE Q A LEE E KR R+ AWD PHGLVGLHNI 




Sbjct: 


1 


MGKGFGLLRKPCQSVVAEPQQYSA-LEE — ERTMKRKRVLSRDLCSAWDS PHGLVGLHNI 


57 


Query: 


61 


GQTCCLNSLIQVFVMNVDFTRILKRITVPRGADEQRRSVPFQMLLLLEKMQDSRQKAVRP 


120 






GQTCCLNSL+QVF+MN+DF ILKRITVPR A+E++RSVPFQ+LLLLEKMQDSRQKA+ P 




Sbjct: 


58 


GQTCCLNSLLQVFMMNMDFRMILKRITVPRSAEERKRSVPFQLLLLLEKMQDSRQKALLP 


117 


Query: 


121 


LELAYCLQKCNVPLFVQHDAAQLYLKLWNLIKDQITDVHLVERLQALYTI RVKDSLICVD 


180 






EL CLQK NVPLFVQHDAAQLYL +WNL KDQITD L ERLQ L+TI ++SLICV 




Sbjct: 


118 


TELVQCLQKYNVPLFVQHDAAQLYLTIWNLTKDQITDTDLTERLQGLFTIWTQESLICVG 


177 


Query: 


181 


CAMESSRNSSMLTLPLSLFDVDSKPLKTLEDALHCFFQPRELSSKSKCFCENCGKKTRGK 


240 




C ESSR S +LTL L LFD D+KPLKTLEDAL CF QP+EL+S C CE CG+KT K 




Sbjct: 


178 


CTAESSRRSKLLTLSLPLFDKDAKPLKTLEDALRCFVQPKELASSDMC-CETCGEKTPWK 


236 


Query: 


241 


QVLKLTHLPQTLTIHLMRFSIRNSQTRKICHSLYFPQSLDFSQILPMKRESCDAEEQSGG 


300 






QVLKLTHLPQTLTIHLMRFS RNS+T KICHS+ FPQSLDFSQ+LP + + D +EQS 




Sbjct: 


237 


QVLKLTHLPQTLTIHLMRFSARNSRTEKICHSVNFPQSLDFSQVLPTEEDLGDTKEQSEI 


296 


Query: 


301 


QYELFAVIAHVGMADSGHYCVYIRNAVDGKWFCFNDSNICLVSWEDIQCTYGNPNYHWQE 


360 






YELFAVIAHVGMAD GHYC YIRN VDGKWFC FNDS++C V+W+D+QCTYGN Y W+E 




Sbjct: 


297 


HYELFAVIAHVGMADFGHYCAYIRNPVDGKWFC FNDSHVCWVTWKDVQCTYGNHRYRWRE 


356 


Query: 


361 


TAYLLVYMK 369 
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TAYLLVY K 
Sbjct: 357 TAYLLVYTK 365 



Pedant information for DKFZphfbr2_78k24, frame 1 
Report for DKFZphfbr2_78k24 . 1 

[LENGTH] 372 

[MW) 43011.12 

[plj 8.05 

[HOMOL] TREMBLNEW:AF069502_1 product: "ubiquitin specific protease UBP4 3"; Mus musculus 

ubiquitin specific protease UBP43~mRNA, complete cds. le-151 

[FUNCATJ 06.13 proteolysis [S. cerevisiae, YMR304w] 3e-19 

[FUNCAT] 06. 13 .01 cytoplasmic degradation [S. cerevisiae, YJL197w] 3e-16 

[ FUN CAT] 06.07 protein modification (glycolsylation, acylation, myristylation, 

palmitylation, farnesylation and processing) [S. cerevisiae, YMR223w] le-15 

[FUNCAT] 04.05.01.04 transcriptional control [S. cerevisiae, YNLl86w] 6e-12 

[FUNCAT] 03,10 sporulation and germination [S. cerevisiae, YDR069c) 9e-ll 

[FUNCAT] 10.03.99 other osmosensing activities [S. cerevisiae, YDR069c] 9e-ll 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YDR069c] 9e-ll 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YDR069c] 9e-ll 

[FUNCAT] 09.25 vacuolar and lysosomal biogenesis [S. cerevisiae, YDR069c] 9e-ll 

[BLOCKS] BL00582A Ribosoraal protein L33 proteins 

[BLOCKS] BL00972E 

[BLOCKS] BL00972D 

[BLOCKS] BL00972A 

[EC] 2.4.2.29 Queuine tRNA-ribosyltransf erase le-06 

[PIRKW] pentosyltransferase le-06 

[PIRKW] glycosyltransferase le-06 

[PIRKW] tRNA modification le-06 

[PIRKW] alternative splicing 7e-ll 

[PIRKW] hydrolase 7e-06 

[SUPFAM] deubiquinating enzyme SSV7 2e-09 

(PROSITE] UCH_2_2 1 

[PFAM] Ubiquitin carboxyl-terminal hydrolases family 2 

[PFAM] Ubiquitin carboxyl-terminal hydrolases family 2 

[KW] Alpha_Beta 



SEQ MSKAFGLLRQICQSILAESSQSPADLEEKKEEDSNMKREQPRERPRAWDYPHGLVGLHNI 

PRD cccceeechhhhhhhhcccccccchhhhhhhhcccccccccccccccccccccccccccc 

SEQ GQTCCLNSLIQVFVMNVDFTRILKRITVPRGADEQRRSVPFQMLLLLEKMQDSRQKAVRP 

PRD cceeehhhhhhhhhcccchhhhhhhcccccccccchhhhhhhhhhhhhhhhhhhhccccc 

SEQ LELAYCLQKCNVPLFVQHDAAQLYLKLWNLIKDQITDVHLVERLQALYTIRVKDSLICVD 

PRD hhhhhccccccccchhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhheeeee 

SEQ CAMESSRNSSMLTLPLSLFDVDSKPLKTLEDALHCFFQPRELSSKSKCFCENCGKKTRGK 

PRD ccccccccccccccccccccccccchhhhhhhhhhhhhhcccccccceeecccccccccc 

SEQ QVLKLTHLPQTLTIHLMRFSIRNSQTRKICHSLYFPQSLDFSQILPMKRESCDAEEQSGG 

PRD cceeeecccchhhhhhhhhhhccchhhhhccccccccccccccccccccccccccccccc 

SEQ QYELFAVIAHVGMADSGHYCVYIRNAVDGKWFCFNDSNICLVSWEDIQCTYGNPNYHWQE 

PRD eeeeeeeeeeeccccccceeeeeecccccceeeeccceeeeeecccccccccccccchhh 



SEQ 
PRD 



TAYLLVYMKMEC 
hhhhhhhhhccc 



PS00973 



Prosite for DKFZphfbr2_78k24 . 1 
302->320 UCH 2 2 PDOC00750 



Pfam for DKFZphfbr2J78k24 . 1 



HMM_NAME Ubiquitin carboxyl-terminal hydrolases family 2 

HMM * G I qNlGNTC YMNS IIQCL* 

G+ N+G TC +NS+IQ+ 
Query 56 GLHNIGQTCCLNSLIQVF 73 
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HMM_NAME Ubiquitin carboxyl-terminal hydrolases family 2 

HMM * YdLYgVICHYGntldyGHYWaYVKNenhHRWkWYYFDDEtV* 

Y+L++VI H G D+GHY +Y++N ++KW++F+D+++ 
Query 302 YELFAVIAHVG-MADSGHYCVYIRNAV — DGKWFCFNDSNI 339 
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DKFZphfbr2_78n23 



group: brain derived 

DKFZphfbr2 78n23 encodes a novel 329 amino acid protein with similarity to A.thaliana 
F26P21.80 protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to A.thaliana F26P21.80 
Sequenced by MediGenomix 

Locus: /map="89.1 cR from top of Chrl9 linkage group" 
Insert length: 1447 bp 

Poly A stretch at pos. 1374, polyadenylation signal at pos. 1353 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



TACAACTTCC 
CTTAGAAGGA 
GAAGTGGCAG 
GCACTCGGCA 
AGGACCGGGC 
GAGGGTGAGG 
CGGCCCTAAG 
GGACACCAAG 
TCAGAGGAAA 
CAACGCCCTC 
AACACAAGAT 
GACACGGCCT 
CTGCCTCTAT 
GACTTTTCAG 
GTGCAGACGA 
CCGTCCACCT 
TGTTCCAGTG 
ACTGAGGAGA 
GGGCAGCCTG 
CTGGGCCAGC 
CCCCTGCAGC 
GGAGGATGAA 
ATCTGCACCT 
AAACTGGGTT 
AGGGTACCTT 
CCCAGGAACT 
CCTAGTTTGT 
CAAAATAAAA 
AAAAAAAAAA 



GGCTGTAAAG 
GGTTCAGGCT 
AGCCCAGCAG 
GAGCCTCGGC 
AGTAGGGGCA 
CCGCCAGTGC 
TCCTGGCAGG 
GGTCAACTGT 
TGTCACTGCC 
AATGTCTCTC 
CGACAAAAGC 
GGCTGTCTGG 
GATCTGGAGA 
CCTCATCCAG 
TTCCCCCGCC 
TGCCAGCCCC 
CCCATATTTC 
AGGAGGAGGA 
GATACCAAGG 
CCTGGAGTTG 
GGCCTTGCCA 
GCCATTGAGG 
TCTTGTGCAA 
CCTTGGGACC 
GCAGGGTCCT 
GTGGGCACCC 
CATGGATAAT 
ATTTGAGACT 
AAAAAAAAAA 



ATGGCGGCTT 
ACGGTGAGCC 
CCCCACTGAA 
CCCGCACTCG 
CAGGCCAGCG 
TGATGATGGG 
TGCCCCCGCC 
CCAGAGAAAG 
AAAGCTGGAG 
AGAAGATGAT 
CACGAGTTTG 
CCTGACCTCC 
CGGCCTCCTG 
CAGAAAACTG 
ATATGTGGTC 
AGTTCTCCTT 
TTCTTTGACG 
GATGAGTTGG 
GTACCAGCTA 
CACAACTGCA 
GAGCCATGCT 
TTGAGGCCAC 
GGAAGTCCTT 
TCCGGGGTGG 
AGGAGGGAAA 
ATTTTCTGTG 
TTTTGTTCTT 
CGTTAAAAAA 
AAAAAAGAAA 



CCTAGTGAGT 
GAAGCCACAC 
GAGGAGGAGG 
CTCCAATCCT 
TGGGCAGCCG 
AGCCTCAACA 
AGCCCCTGAG 
TGATTATCTG 
TCGTTCAACG 
TGAGATGTTC 
CACTGGTGGT 
GACCCCCGCG 
TTCCACCTTC 
AGCTTCCGGT 
CGCACCATCC 
GACGGAGCCC 
TTGTTTACAT 
AAGGATATGT 
CAAGTATGAG 
TGGCGAAACT 
TCCTACAGCC 
TGTCTGAACC 
GGCCTAAAGC 
GGGGGTTCCA 
CCCAGGATTC 
TCTCCCAGCC 
CCCTGTGTGA 
AAAAAAAAAA 
AAAAAAAAAA 



CGGCGGCTGA 
AGGAGCCATG 
AGGAAGAGGA 
GAAGGGGCTG 
CAGCGAGGGT 
CTTCAGGAGC 
GTCCAAATTC 
CCTGGACCTG 
GCTCCAAAAC 
GTGCGGACAA 
GGTGAACGAT 
AGCTCTGTAG 
AATCTGGAAG 
CACAGAGAAC 
TTGTCTACAG 
ATGAAGAAAA 
CCACAATGGC 
TTGCCTTCAT 
GTGGCACTGG 
GTTGGCCCAC 
TGCTGGAGGA 
ATCCCTGTAC 
CTTGGTTCTC 
GGAGGCACGT 
CAGGAGGGAT 
CATTTCCACT 
TTTTTGCCAT 
AAAAAAAAAA 
AAAAAAA 



BLAST Results 



Entry HS806352 from database EMBL: 
human STS EST192543. 
Score - 1285, P = 2.5e-51, identities - 263/266 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 98 bp to 1084 bp; peptide length: 329 
Category: similarity to unknown protein 
Classification: no clue 

1 MEVAEPSSPT EEEEEEEEHS AEPRPRTRSN PEGAEDRAVG AQASVGSRSE 
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51 GEGEAASADD GSLNTSGAGP KSWQVPPPAP EVQIRTPRVN CPEKVIICLD 
101 LSEEMSLPKL ESFNGSKTNA LNVSQKMIEM FVRTKHKIDK SHEFALWVN 
151 DDTAWLSGLT SDPRELCSCL YDLETASCST FNLEGLFSLI QQKTELPVTE 
201 NVQTIPPPYV VRTILVYSRP PCQPQFSLTE PMKKMFQCPY FFFDVVYIHN 
251 GTEEKEEEMS WKDMFAFMGS LDTKGTSYKY EVALAGPALE LHNCMAKLLA 
301 HPLQRPCQSH ASYSLLEEED EAIEVEATV 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_78n23, frame 2 

PIR:T05304 hypothetical protein F26P21.80 - Arabidopsis thaliana, N - 
1, Score = 142, P = 1.5e-07 



>PIR:T05304 hypothetical protein F26P21.80 - Arabidopsis thaliana 
Length = 264 

HSPs: 

Score - 142 (21.3 bits), Expect « 1.5e-07, p « 1.5e-07 
Identities « 56/216 (25%), Positives = 97/216 (44%) 



Query: 93 EKVIICLDL-SEEMSLPKLESFNGSKTNALNVSQKMIEMFVRTKHKIDKSHEFALVVVND 151 

E ++IC+D+ +E M K NG + ++ I +F+ K 1+ H FA + 

Sbjct: 26 EDILICIDVDAESMVEMKTTGTNGRPLIRMECVKQAIILFIHNKLSINPDHRFAFATLAK 85 

Query: 152 DTAWLSG-LTSDPRELCSCLYDLE-TASCSTFNLEGLFSLIQQKTELPVTENVQTIPPPY 209 

AWL TSD + L L S S +L LF Q+ ++ +N 
Sbjct: 86 SAAWLKKEFTSDAESAVASLRGLSGNKSSSRADLTLLFRAAAQEAKVSRAQN R 138 

Query: 210 VVRTILVYSRPPCQPQFSLTEPMKKMFQCPYFFFDWYIHNGTEEKEEEMSWKDMF-AFM 268 

+ R IL+Y R +P P+ + F DV+Y+H ++ + +D++ + + 
Sbjct: 139 IFRVILIYCRSSMRPTHEW--PLNQKL FTLDVMYLH DKPSPDNCPQDVYDSLV 189 

Query: 269 GSLD — TKGTSYKYEVALAGPALELHNCMAKLLAHPLQRPCQ 308 

+++ ++ Y +E G A + M+ LL HP QR Q 
Sbjct: 190 DAVEHVSEYEGYIFESG-QGLARSVFKPMSMLLTHPQQRCAQ 230 

Pedant information for DKFZphfbr2_78n23, frame 2 



Report for DKFZphfbr2_78n23 .2 

[LENGTH) 329 

[MW] 36560.10 

(pi) 4.60 

[HOMOL] PIR:T05304 hypothetical protein F26P21.80 - Arabidopsis thaliana 7e-07 

(KW) Alpha_Beta 

[KW] LOW_COMPLEXITY 9.73 % 

SEQ MEVAEPSSPTEEEEEEEEHSAEPRPRTRSNPEGAEDRAVGAQASVGSRSEGEGEAASADD 

SEG . xxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccchhhhhhhhhhhccccccccccccchhhhhhhhhhhccccccccccccccc 

SEQ GSLNTSGAGPKSWQVPPPAPEVQIRTPRVNCPEKVIICLDLSEEMSLPKLESFNGSKTNA 

SEG 

PRD ccccccccccccccccccccceeeccccccccceeeeeccccccccccccccccccccee 

SEQ LNVSQKMIEM FVRTKHKI DKSHEFALVVVNDDTAWLSGLTSDPRELCSCLYDLETASCST 

SEG 

PRD ehhhhhhhhhhhhhhhccccccceeeeeeccchhhhhcccccchhhhhhhhhcccccccc 

SEQ FNLEGLFSLI QQKTELPVTENVQTIPPPYVVRTILVYSRPPCQPQFSLTEPMKKMFQCPY 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhcccccccccceeeeeeecccccccccccchhhhhheeee 

SEQ FFFDVVYIHNGTEEKEEEMSWKDMFAFMGSLDTKGTSYKYEVALAGPALELHNCMAKLLA 

SEG 

PRD eeeeeeeeccccchhhhhhhhhhhhhhhhcccccccceeeeecccccchhhhhhhhhhhh 

SEQ HPLQRPCQSHASYSLLEEEDEAIEVEATV 

SEG xxxxxxxxxx. . . 

PRD hcccccccccchhhhhhhhhhhhhhhccc 
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(No Pfam data available for DKFZphfbr2J78n23.2) 
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DKFZphfbr2_7a24 



group: brain derived 

DKFZphfbr2_7a24 encodes a novel 142 amino acid protein with similarity to the C-terminal part 
of transforming growth factor-beta activated kinases. 

The novel protein shows only similarity to the C-terminus of such kinases; no kinase domain is 
present . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 



similarity to C-terminus of TGF-beta-activated kinase 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 1697 bp 

No poly A stretch found, no polyadenylation signal found 



1 GGGGAGAGAG GGGTTGTGAA GGGAAGCGGA AGGGAAGGGA AGGGAGGTCC 

51 CGTGGGACGC TGGGGTCTGG GGTAGAGCAG GTAGCAGCGT GCTGCCCTGA 

101 CAGCTGTCTC CGCTCCTCAG ATTGTCAGTG GCTGCTATGC AGCAGGTGCA 

151 GCCTGGTCTC TCACTGAGTC TCTACTCCAC AAAGGCAACG ACTGGCCAAG 

201 GCAGTGGCTG GCTCTGGGTT ACACAAGTGC AGACACTCAA CTAAGTGAGC 

251 TGGAAGACCC AGGAGAAGGC GGAGGCTCAG GTGCCCACAT GATCAGCACA 

301 GCCAGGGTAC CTGCTGACAA GCCTGTACGC ATCGCCTTTA GCCTCAATGA 

351 CGCCTCAGAT GATACACCCC CTGAAGACTC CATTCCTTTG GTCTTTCCAG 

401 AATTAGACCA GCAGCTACAG CCCCTGCCGC CTTGTCATGA CTCCGAGGAA 

451 TCCATGGAGG TGTTCAGACA GCACTGCCAA ATAGCAGAAG AATACCTTGA 

501 GGTCAAAAAG GAAATCACCC TGCTTGAGCA AAGGAAGAAG GAGCTCATTG 

551 CCAAGTTAGA TCAGGCAGAA GAGGAGAAGG TGGATGCTGC TGAGCTGGTT 

601 CGGGAATTCG AGGCTCTGAC GGAGGAGAAT CGGACGTTGA GGTTGGCCCA 

651 GTCTCAATGT GTGGAACAAC TGGAGAAACT TCGAATACAG TATCAGAAGA 

701 GGCAGGGCTC GTCCTAACTT TAAATTTTTC AGTGTGAGCA TACGAGGCTG 

751 ATGACTGCCC TGTGCTGGCC AAAAGATTTT TATTTTAAAT GAATAGTGAG 

801 TCAGATCTAT TGCTTCTCTG TATTACCCAC ATGACAACTG TCTATAATGA 

851 GTTTACTGCT TGCCAGCTTC TAGCTTGAGA GAAGGGATAT TTTAAATGAG 

901 ATCATTAACG TGAAACTATT ACTAGTATAT GTTTTTGGAG ATCAGAATTC 

951 TTTTCCAAAG ATATATGTTT TTTTCTTTTT TAGGAAGATA TGATCATGCT 

1001 GTACAACAGG GTAGAAAATG GTAAAAATAG AC TAT TG ACT GACCCAGCTA 

1051 AGAATCGCGG GCTGAGCAGA GTTAAACCAT GGGACAAACC CATAACATGT 

1101 TCACCATAGT TTCACGTATG TGTATTTTTA AATTTCATGC CTTTAATATT 

1151 TCAAATATGC TCAAATTTAA ACTGTCAGAA ACTTCTCTGC ATGTATTTAT 

1201 ATTTGCCAGA GTATAAACTT TTATACTCTG ATTTTTATCC TTCAATGATT 

1251 GATTATACTA AGAATAAATG GTCACATATC CTAAAAGCTT CTTCATGAAA 

1301 TTATTAGCAG AAACCATGTT TGAAACCAAA GCACATTTGC CAATGCTAAC 

1351 TGGCTGTTGT AATAATAAAC AGATAAGGCT GCATTTGCTT CATGCCATGT 

1401 GACCTCACAG TAAACATCTC TGCCTTTGCC TGTGTGTGTT CTGGGGGAGG 

1451 GGGGACATGG AAAAATATTG TTTGGACATT ACTTGGGTGA GTGCCCATGA 

1501 AGACATCAGT GAACTTGTAA CTATTGTTTT GTTTTGGATT TAAGGAGATG 

1551 TTTTAGATCA GTAACAGCTA ATAGGAATAT GCGAGTAAAT TCAGAATTGA 

1601 AACAATTTCT CCTTGTTCTA CCTATCACCA CATTTTCTCA AATTGAACTC 

1651 TTTGTTATAT GTCCATTTCT ATTCATGTAA CTTCTTTTTC ATTAAAC 



BLAST Results 



No BLAST result 



Medline entries 



98130593: 

Role of TAK1 and TABl in BMP signaling in early Xenopus 
development . 
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Peptide information for frame 1 



ORF from 289 bp to 714 bp; peptide length: 142 
Category: similarity to known protein 



1 MISTARVPAD KPVRIAFSLN DASDDTPPED SIPLVFPELD QGLQPLPPCH 
51 DSEESMEVFR QHCQIAEEVL EVKKEITLLE QRKKELIAKL DQAEEEKVDA 
101 AELVREFEAL TEENRTLRLA QSQCVEQLEK LRIQYQKRQG SS 

BLAST P hits 

Entry U9203O_l from database TREMBL: 

product: "TAK1"; Xenopus laevis TGF-beta-activated kinase TAK1 mRNA, 
complete cds. 

Score » 343, P = 1.3e-30, identities - 69/143, positives « 104/143 
Entry AB009356_1 from database TREMBL: 

product: "TGF-beta activated kinase la"; Homo sapiens mRNA for 
TGF-beta activated kinase la, complete cds. 

Score = 339, P = 2.6e-30, identities = 67/143, positives = 104/143 
Entry MMPK_1 from database TREMBL: 

product: "TAK1 (TGF-beta-activated kinase)"; Mouse mRNA for TAK1 
(TGF-beta-activated kinase), complete cds. 

Score = 339, P - 2.6e-30, identities » 67/143, positives = 104/143 
Entry AB009357_1 from database TREMBL: 

product: "TGF-beta activated kinase lb"; Homo sapiens mRNA for 
TGF-beta activated kinase lb, complete cds. 

Score - 339, P - 3.2e-30, identities « 67/143, positives = 104/143 
Entry AB009358_1 from database TREMBL: 

product: "TGF-beta activated kinase lc"; Homo sapiens mRNA for 
TGF-beta activated kinase lc, complete cds. 

Score - 144, P = 3.8e-09, identities = 30/67, positives - 47/67 



Alert BLASTP hits for DKFZphfbr2_7a24, frame 1 

PIR:JC5955 transforming growth factor-beta activated kinase (EC 
-.-.-.-) la - Human, N = 1 , Score = 339, P = 3e-30 



>PIR:JC5955 transforming growth factor-beta activated kinase (EC -.-.-.-) la 
- Human 

Length = 579 

HSPs: 



Score = 339 (50.9 bits), Expect = 3.0e-30, P * 3.0e-30 
Identities - 67/143 (46%), Positives - 104/143 (72%) 



Query: 


1 


MISTARVPADKPVRI-AFSLNDASDDTPPEDSIPLVFPELDQQLQPLPPCHDSEESMEVF 


59 






MI+T+ ++KP R ++ +D++D ++SIP+ + LD QLQPL PC +S+ESM VF 




Sbjct: 


437 


MITTSGPTSEKPTRSHPWTPDDSTDTNGSDNSIPMAYLTLDHQLQPLAPCPNSKESMAVF 


496 


Query: 


60 


RQHCQIAEEYLEVKKEITLLEQRKKELIAKLDQAEEEKVDAABLVREFEALTEENRTLRL 


119 






QHC++A+EY++V+ EI LL QRK+EL+A+LDQ E+++ + + LV+E + L +EN++L 




Sbjct: 


497 


EQHCKMAQEYMKVQTEIALLLQRKQELVAELDQDEKDQQNTSRLVQEHKKLLDENKSLST 


556 


Query: 


120 


AQSQCVEQLEKLRIQYQKRQGSS 142 








QC +QLE +R Q QKRQG+S 




Sbjct: 


557 


YYQQCKKQLEVIRSQQQKRQGTS 57 9 





Pedant information for DKFZphfbr2_7a24, frame 1 



Report for DKFZphfbr2_7a24 . 1 



[LENGTH] 142 

[MW] 16377.53 

[pi] 4.64 

[HOMOL] TREMBL :U92030_1 product: "TAK1"; Xenopus laevis TGF-beta-activated kinase TAK1 

mRNA, complete cds. 6e-26 

[PROSITE] CK2_PHOSPHO_SITE 3 
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[PROSITE] PKC_PHOSPHO_SITE 2 

[PROSITE] ASN_GLYCOSYLATION 1 

IPFAM) TNFR/NGFR cys teine-rich region 

[KW] All Alpha 

[KW] LOW~COMPLEXITY 7.04 % 

IKW] COILED_COIL 33.10 % 

SEQ MISTARVPADKPVRIAFSLNDASDDTPPEDSIPLVFPELDQQLQPLPPCHDSEESMEVFR 
SEG xxxxxxxxxx 



PRD ccccccccccccccccccccccccccccccccccchhhhhhhhcccccccccchhhhhhh 

COILS 

SEQ QHCQIAEEYLEVKKEITLLEQRKKELIAKLDQAEEEKVDAAELVREFEALTEENRTLRLA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhh 

COILS . . .CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QSQCVEQLEKLRIQYQKRQGSS 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccc 

COILS 



Prosite for DKFZphfbr2_7a24 . 1 



PS00001 
PS00005 
PS00005 
PS00006 
P500006 
PS00006 



114->118 
4->7 

116->119 
18->22 
26->30 
77->81 



ASN_GLYCOSYLATION 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO SITE 
CK2_PHOSPHO~SITE 
CK2_PHOSPHO_SITE 
CK2 PHOSPHO SITE 



PDOC00001 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC000O6 
PDOC00006 



Pfam for DKFZphfbr2_7a24 . 1 



HMM_NAME 

HMM 

Query 



TNFR/NGFR cysteine-rich region 



49 



*CpeG t Y t DWNH vpqClpC t rCe PEMGQYMvqPCTwTQNTVC * 
C++++ + + +Q C++ E+ ++++++ T + ++ 
CHDSEESMEVF-RQH— CQIAEE— YLEVKKEITLLEQRKK 



B4 
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DKFZphfbr2_7e22 
group: brain derived 

DKFZphfbr2_7e22.2 encodes a novel 286 amino acid protein similar to b561 cytochromes 

The new protein shows strong similarity to B561 cytochromes, but contains no heme binding 
site. In addition, a myc-type, helix-loop-helix dimerization domain domain is present. This 
helix-loop-helix domain mediates protein dimerization and has been found in proteins such a 
the myc family of cellular oncogenes, proteins involved in myogenesis and vertebrate protei 
that bind specific DNA sequences in various immunoglobulin chains enhancers. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 

strong similarity to cytochrome b561 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 4254 bp 

Poly A stretch at pos . 4234, polyadenylation signal at pos. 4217 

1 GGGGACTACC CAGAGGGCTG CCGCCGCCTC TCCAAGTTCT TGTGGCCCCC 
51 GCGGTGCGGA GTATGGGGCG CTGATGGCCA TGGAGGGCTA CCGGCGCTTC 

101 CTGGCGCTGC TGGGGTCGGC ACTGCTCGTC GGCTTCCTGT CGGTGATCTT 

151 CGCCCTCGTC TGGGTCCTCC ACTACCGAGA GGGGCTTGGC TGGGATGGGA 

201 GCGCACTAGA GTTTAACTGG CACCCAGTGC TCATGGTCAC CGGCTTCGTC 

251 TTCATCCAGG GCATCGCCAT CATCGTCTAC AGACTGCCGT GGACCTGGAA 

301 ATGCAGCAAG CTCCTGATGA AATCCATCCA TGCAGGGTTA AATGCAGTTG 

351 CTGCCATTCT TGCAATTATC TCTGTGGTGG CCGTGTTTGA GAACCACAAT 

401 GTTAACAATA TAGCCAATAT GTACAGTCTG CACAGCTGGG TTGGACTGAT 

451 AGCTGTCATA TGCTATTTGT TACAGCTTCT TTCAGGTTTT TCAGTCTTTC 

501 TGCTTCCATG GGCTCCGCTT TCTCTCCGAG CATTTCTCAT GCCCATACAT 

551 GTTTATTCTG GAATTGTCAT CTTTGGAACA GTGATTGCAA CAGCACTTAT 

601 GGGATTGACA GAGAAACTGA TTTTTTCCCT GAGAGATCCT GCATACAGTA 

651 CATTCCCGCC AGAAGGTGTT TTCGTAAATA CGCTTGGCCT TCTGATCCTG 

701 GTGTTCGGGG CCCTCATTTT TTGGATAGTC ACCAGACCGC AATGGAAACG 

751 TCCTAAGGAG CCAAATTCTA CCATTCTTCA TCCAAATGGA GGCACTGAAC 

801 AGGGAGCAAG AGGTTCCATG CCAGCCTACT CTGGCAACAA CAT GG AC AAA 

851 TCAGATTCAG AGTTAAACAA TGAAGTAGCA GCAAGGAAAA GAAACTTAGC 

901 TCTGGATGAG GCTGGGCAGA GATCTACCAT GTAAAATGTT GTAGAGATAG 

951 AGCCATATAA CGTCACGTTT CAAAACTAGC TCTACAGTTT TGCTTCTCCT 
1001 ATTAGCCATA TGATAATTGG GCTATGTAGT ATCAATATTT ACTTTAATCA 
1051 CAAAGGATGG TTTCTTGAAA TAATTTGTAT TGATTGAGGC CTATGAACTG 
1101 ACCTGAATTG GAAAGGATGT GATTAATATA AATAATAGCA GATATAAATT 
1151 GTGGTTATGT TACCTTTATC TTGTTGAGGA CCACAACATT AGCACGGTGC 
1201 CTTGTGCAGA ATAGATACTC AATATGTGAA TATGTGTCTA CTAGTAGTTA 
1251 ATTGGATAAA CTGGCAGCAT CCCTGGCCTG TTGTCATGCA GTCATTTCCT 
1301 GTTAATTCTG GGAGACAATG ATTTCACAAC TAGAGGGAAG CAGTCCTAAA 
1351 AGTTTAAAAT CCGATAAGGA ATATCTGGGA CAGGGTTTAG ATCATGACTC 
1401 TACACAGATA CCATGATGAG AGTATATTAA AGAAATTTAG GAAAGCACCT 
1451 GGTTCCTTTC TCCCCATGCC TGCCTTCTGC TCCCTCCCCA GCTGGTTTGG 
1501 GCTCAAATTG TCCCTGGAGA CTAGGGTTTA TGTTAGGGTA TT GAT AG ATT 
1551 AGAGCAGGTG GTTGAAGAGA TCTTCTCTGG TCAGACTTGG AAGAATTTCC 
1601 AAAAGTGAAG TTAGCCCCAA GACTTCCCTA GGGTTGATGT ACTTTATGAT 
1651 CCAGATGCTA AACTTCTTAG AATGAAAATA TGCTTCAACA CTTAAGTAGC 
1701 ATACACTGCC CTACAAACCT CAGAGAGCAC TTTTCCCCAA GTTCTTGTTT 
1751 TTATTTTTGA AAGTACTCAC ACAGCACTTA CTATGCTCCA AACACTCCTC 
1801 TAAGCACTTT AC AC AT ATT A GCTCATTCAG TCCCCAGACA GACGGGATGA 
1851 AGTAGGTATT GTTACTGTTC CCATTTTACA GGTGAGAGAT TTGAAGCCTG 
1901 GGGAGGCTAG TAACTCACCC CAAGGTCACA CGGCTCATAC ATGGTGGGAC 
1951 TGAGACTCAG ATGCAGGCAG TCTGGCACCT CAGTCTGGAT TCTAACCATT 
2001 TCACTAAGCT ATTTTTGTCT TGTACTACTT TGACCCACCC CTGAATAAAC 
2051 CTCAATTGCT GGAGTGGGGT GTAGTTATTA AAGGGATGCT TTTTACCTTT 
2101 TGCTGTCTGC TGTGGCAGAT TCCCCAGATA ACCAAGGAAA AGGGGCCACC 
2151 CATACCTGGA AATAGGCCAT AGGGCCCCTA CTACTGCCAA CAAGCCATGG 
2201 CCTACCTTGA CACTTGTTTG ATCTTAAAAT TGTGTCTTGG TAACAAAAGA 
2251 TTTGGACAGG CATATCTGTA GCTTTCAAGT TAATTAATTG CAATATTTTT 
2301 TTCTTCAGGA TTTTAGCTGC TGAACAACTT TCAGTTTGGA GCTAAAAGAG 
2351 ACCTGTCTCA TGGTCTGCCC TTCCCTGGGG CAATAGCTAG GGTCTTTCCT 
2401 GATTTTTATG GAATTTTAGG GGATATTTTG AGCTTTGGGT TCTCAGTAGT 
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24 51 GAATTGAGAC TTGGAGGTGA CTTTTCATGT TTGGAGTATC ATCTCTGTCT 

2501 GGGCTCTGGG CTGACAAATT AAAACCTAGA GTAGTGCTTA TGCTGAAATG 

2551 ATACTTTTCA TTTTTTGGTT GATTTTTTTG CCTTCCCTTC AATTTTAAAC 

2601 TGAAGCATTT TAATGTGGGT AGAAACTCTA CACCAAATAC ACTAAACATT 

2651 TTGGTGCTTA GTGGATTTCT TTTTAGGTAA CTGGTACTTA CTTCCAAAGA 

2701 CTGAATACAA GCCACACTCC ATCATATCCC TTAAACTTCA TGAAAAACCA 

2751 TTCAAGATCC CCTTGCTGCA ACACTGTTCT CTTCTTCTCT ACTAAATTCT 

2801 ATTTCCAAAA TTGGTAATAG AGCCAGAAGG ATCCCCAGTA CCCAGCCCTC 

2851 TGCCTGGCAC AAAGTGGTAG CACAATTAAA TTCAGTATGG GTGGAGCATG 

2901 GTACAGTCTT GGTGCCATAG AAGGAGTAGT TGCATAGTCA CACATCATTT 

2951 GATAAGTTGG ATGTTCCATT ACATAGAGGA ACACAAAATT CCAGGGTTTT 

3001 TGGAGGAAGG GATTAGATAG CGACTAAGCC GCCAGAATTG AGGTGGCCAT 

3051 TCCTTTTTGT ATAGGCTAAG AAACAGGTTA TCAGTGAAAA GTTAATTATG 

3101 GCTTTGGCAC TAGAATAGCA CTGTTGCAAA GTATTTAAGC ACCCCCCATC 

3151 TCAGCCCTTT ATTTTATCTT TCATGTGGGC TAATGTGAGG ATAATCTTAC 

3201 AGATATTATA GGAATTTCTT TTCTATCTTT ATGAAAACAA CGTATATAAA 

3251 ATATATCTAG AAAACCTTTG TTTGAGACTC TTATTTAATG GGCTTTTGAT 

3301 TCTAATGATA ATTGTACCTT TATCTTTCAA AAGCTGATAT TTCCTACCTA 

3351 AGCATCTCCC GAGAAAAATA TCTCATTAAA AAGCCCATAA ATAATAGGGG 

3401 AGAAGAAAGC CTTAGGTATC AATTCCAAAA CAGTGATTGA AATTTCCCAA 

3451 AATAATTATG GCTTCTGTCA TCTCCAGAGA TAATCTGGCT TGGTTTACCC 

3501 CATAATCTAA TTTCAGAAAA GAAAGCTTTA TTTTAACACT CATCTGAATC 

3551 AACATTAAAG CCTTTTCTCT CAAAGCGTTT ATTGAGAAAC TCAAATGAAT 

3601 ATACTTTTTG AATTACTGTC ATCAAAAGTG TACGGCTTCC TGTGCTGCTT 

3651 GTGTCAAATG GAACCTGCCC TCTAAAGCAC TTTCTTTCCT TTACTTGCGT 

3701 GGTTTCATGT AAGCTGTGCT GTTTAGAAAC AACATCTCAG ACTTTACAAA 

3751 GAAATGACAA AGAAGGCAAT TGCACTTTTT AAGGGATATC GACAAGCAGT 

3801 TTCTGTTTTC TAAAGGACAA AATACAGAGT GTGTGTCATT TTTAATTAGA 

3851 TTCTTTCCCC TGCTGAGTTG GAAATTCCAG TGCAGCACTG ATTGACCACA 

3901 GTTGCCAATC TAAAAGCACA AAGACAGAAG TAAAGCTTTA TGCTAATTTT 

3951 ATTTCAATAT GATAGAAAAT TTATCTTGGT ATGTCCTTTT TTAGATAACT 

4001 CCAGCAGGAA ACTGTAACTG CTATGTCTTT AGGAAAACGT AGAAGAAAGA 

4051 ACATTATTAT TCTTTAATTC CTACAAGGTA CTTGAAAACC TTAAGTGAAA 

4101 AAGATTTCTA TCTTTTTATC TTGGCGCATT TATGGAAAAA ATATTAACTG 

4151 TCCTGAATAT TTTATAATTT TGTAGGAAAA ATATGCATCT ATTTTTTCTT 

4201 GACTTCTTTT ATATAGTAAT AAAAGTTATT TTGGAAAAAA AAAAAAAAAA 

4251 AAAA 



BLAST Results 



Entry HSG20626 from database EMBL: 
human STS A005Z27. 
Score - 860, P = 3.0e-32, identities « 176/181 



Medline entries 



89030633: 

The structure of cytochrome b561, a secretory vesicle-specific electron 
transport protein. 



Peptide information for frame 2 



ORF from 74 bp to 931 bp; peptide length: 286 
Category: strong similarity to known protein 
Classification: unset 



1 MAMEGYRRFL ALLGSALLVG FLSVIFALVW VLHYREGLGW DGSALEFNWH 

51 PVLMVTGFVF IQGIAIIVYR LPWTWKCSKL LMKSIHAGLN AVAAILAIIS 

101 WAVFENHNV NNIANMYSLH SWVGLIAVIC YLLQLLSGFS VFLLPWAPLS 

151 LRAFLMPIHV YSGIVIFGTV IATALMGLTE KLIFSLRDPA YSTFPPEGVF 

201 VNTLGLLILV FGALIFWIVT RPQWKRPKEP NSTILHPNGG TEQGARGSMP 

251 AYSGNNMDKS DSELNNEVAA RKRNLALDEA GQRSTM 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_7e22, frame 2 
SWISSPROT:C561 SHEEP CYTOCHROME B561 (CYTOCHROME B-561)., N - 1, Score 
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- 460, P = 1.3e-43 

PIR:S01167 cytochrome b561 - bovine, N = 1, Score = 457, p 

SWISSPROT:C561_PIG CYTOCHROME B561 (CYTOCHROME B-561)., N = 
452, P = 9.1e-43 

PIR:S53321 cytochrome B561 - human, N - 1, Score « 451, P = 



>SWISSPROT:C561_SHEEP CYTOCHROME B561 {CYTOCHROME B-561). 
Length = 252 

HSPs: 

Score = 460 (69.0 bits), Expect = 1.3e-43, P » 1.3e-43 
Identities = 96/218 (44%), Positives - 131/218 (60%) 



Query: 18 LVGFLSVIFALVWVLHYREGLGWDGSALEFNWHPVLMVTGFVFIQGIAIIVYRLPWTWKC 77 

L+G V W+ YR G+ W+ SAL+FN HP+ MV G VF+QG A++VYR+ 

Sbjct: 23 LLGLTVVAMTGAWLGMYRGGIAWE-SALQFNVHPLCMVIGLVFLQGDALLVYRV — FRNE 19 

Query: 78 SKLLMKSIHAGLNAVAAILAIISWAVFENHNVNNIANMYSLHSWVGLIAVICYLLQLLS 137 

+K K +H L+ A ++A++ +VAVFE+H A++YSLHSW G++ + Q L 

Sbjct: 80 AKRTTKVLHGLLHVFAFVIALVGLVAVFEHHRKKGYADLYSLHSWCGILVFALFFAQWLV 139 

Query: 138 GFSVFLLPWAPLSLRAFLMPIHVYSGI VI FGTVIATALMGLTEKLI FSLRDPAYSTFPPE 197 

GFS FL P A SLR+ P HV+ G IF +ATAL+GL E L+F L YSTF PE 
Sbjct: 140 GFSFFLFPGASFSLRSRYRPQHVFFGAAIFLLSVATALLGLKEALLFEL-GTKYSTFEPE 198 

Query: 198 GVFVNTLGLLILVFGALIFWIVTRPQWKRPKEPNSTIL 235 

GV N LGLL+ F ++ +I+TR WKRP + L 
Sbjct: 199 GVLANVLGLLLAAFATVVLYILTRADWKRPLQAEEQAL 236 

Pedant information for DKFZphfbr2_7e22, frame 2 



Report for DKF2phfbr2_7e22 .2 

[LENGTH] 286 

(MW) 31638.58 

(pi] 9.12 

[HOMOL] SWISSPROT:C561_SHEEP CYTOCHROME B561 (CYTOCHROME B-561) . 4e-40 

[PIRKW) transmembrane protein 9e-40 

[KW] SIGNAL_PEPTIDE 40 

[KW] TRANSMEMBRANE 5 

(KW] LOW_COMPLEXITY 4 . 90 % 

SEQ MAMEGYRRFLALLGSALLVGFLSVI FALVWVLHYREGLGWDGSALEFNWHPVLMVTGFVF 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhcchhhhhhhhhccccccccccccccccchhhhhhhhh 

MEM MMMMMMMMMMMM 

SEQ IQGIAIIVYRLPWTWKCSKLLMKSIHAGLNAVAAILAIISVVAVFENHNVNNIANMYSLH 

SEG xxxxxxxxxxxxxx 

PRD ccccceeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccceeecc 

MEM MMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ SWVGLIAVICYLLQLLSGFSVFLLPWAPLSLRAFLMPIHVYSGIVIFGTVIATALMGLTE 

SEG 

PRD cccchhhhhhhhhhhhhhheeeeccccccccccccccceeeeeeeeeeehhhhhhhhhhh 

MEM . . . .MMMMMMMMMMMMMMMMMMMMM. . . MMMMMMMMMMMMMMMMMMMMM. . . 

SEQ KLIFSLRDPAYSTFPPEGVFVNTLGLLILVFGALIFWIVTRPQWKRPKEPNSTILHPNGG 

SEG 

PRD hhhhhhhccccccccccchhhhhhhhhhhhhhhheeeeeecccccccccccccccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

S EQ TEQGARGSMP AYSGNNMDKS DSELNNEVAARKRNLALDEAGQRSTM 

SEG 

PRD cccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhcccc 

MEM 



(No Prosite data available for DKFZphfbr2_7e22.2) 
(No Pfam data available for DKFZphfbr2_7e22 .2) 



= 2.7e-43 
1, Score - 

1.2e-42 
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DKFZphfbr2_7j4 ' 
group: brain derived 

DKFZphfbr2_7j4 encodes a novel 233 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 

unknown 

complete cDNA, complete cds, 1 EST hit 

Sequenced by GBF 

Locus : unknown 

Insert length: 1050 bp 

Poly A stretch at pos . 1027, polyadenylation signal at pos. 1007 

1 GGGGACACAA AGGGGTGGTC ACCCTGCCCT CACCTTGACC TGTAAGTTGC 
51 CTAGGACAGT GGCCTGGTCC CAGGGGCTGT TGTGGGGAGT TGAAGAACAC 
101 CCTGGCCTCC TCCATCATGT CGGCCAAGAG GGCAGAATTG AAGAAAACAC 
151 ATCTGTGCAA GAACTACAAG GCAGTTTGCC TGGAATTGAA GCCAGAGCCG 
201 ACCAAAACAT TTGATTACAA AGCAGTTAAA CAAGAAGGGC GGTTTACCAA 
251 AGCAGGAGTG ACACAGGACC TAAAGAATGA ACTCAGGGAA GTGAGAGAAG 
301 AGCTCAAGGA GAAAATGGAG GAGATAAAAC AGATAAAGGA TCTAATGGAC 
351 AAGGATTTTG ATAAACTTCA CGAATTTGTG GAAATTATGA AGGAAATGCA 
401 GAAAGATATG GATGAGAAGA TGGACATTTT AATAAATACA CAGAAGAACT 
451 ATAAGCTTCC CCTTAGAAGA GCACCAAAGG AGCAGCAGGA ACTCAGGCTG 
501 ATGGGAAAGA CTCACAGAGA ACCACAGCTC AGGCCCAAGA AAATGGATGG 
551 AGCCAGTGGA GTCAATGGAG CACCCTGTGC TCTTCACAAG AAGACGATGG 
601 CACCACAAAA AACAAAACAG GGCTCACTGG ATCCCCTTCA TCACTGTGGG 
651 ACCTGCTGCG AGAAATGTTT GTTGTGTGCT CTAAAGAACA ACTACAATCG 
701 GGGGAACATT CCTTCAGAGG CCTCAGGCCT TTACAAAGGT GGAGAGGAGC 
751 CAGTGACCAC CCAACCTTCT GTGGGCCACG CTGTGCCTGC CCCAAAGTCC 
801 CAGACTGAGG GAAGGTGAAG CTTAACTGCC AGCTTGAAAT GAGAGTAAAG 
851 AAGATACAGA GCAAACAGTG TTTCAGAAAC TGTCCTGCCC TGGGTGTGAT 
901 TCTTTGGCTT CAATTTGAAG GAGGAGGAAT GATGGGATTT CATATTTTAT 
951 TTCACACCAG TTCCTCCTTG TTTCATCTCT TTGCTAAGCT GGCTGCTTCT 
1001 ACCATCTAAT AAATAATTGG CCAAGTTAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 117 bp to 815 bp; peptide length: 233 
Category: putative protein 



1 MSAKRAELKK THLCKNYKAV CLELKPEPTK TFDYKAVKQE GRFTKAGVTQ 
51 DLKNELREVR EELKEKMEEI KQIKDLMDKD FDKLHEFVEI MKEMQKDMDE 
101 KMDILINTQK NYKLPLRRAP KEQQELRLMG KTHREPQLRP KKMDGASGVN 
151 GAPCALHKKT MAPQKTKQGS LDPLHHCGTC CEKCLLCALK NNYNRGNIPS 
201 EASGLYKGGE EPVTTQPSVG HAVPAPKSQT EGR 

BLASTP hits 

Entry JC2223 from database PIR: 

major surface glycoprotein 3 - Pneumocystis carinii (fragment) 
Score = 109, P * 3.5e-04, identities = 41/136, positives - 67/136 
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Alert BLASTP hits for DKFZphfbr2_7 j4, frame 3 

TREMBLNEW:PCP115C 1 product: "P115C"; Pneumocystis carinii mRNA for 
P115C, partial sequence., N - 1, Score - 109, P = 0.00024 



>TREMBLNEW:PCP115C_1 product: "P115C"; Pneumocystis carinii mRNA for P115C, 
partial sequence. 

Length = 196 

HSPs: 

Score = 109 (16.4 bits), Expect = 2.4e-04, P « 2.4e-04 
Identities - 41/134 (30%), Positives ■ 67/134 (50%) 

Query: 14 CKN-YKAVCLELKPEPTKTFDYKAVKQEGRFTKA-GVTQDLKNELREVREELKEKMEEIK 71 

CK K C ELK + K VK+ TK G ++LK+++++ E KE++E K 

Sbjct: 22 CKTELKKYCEELKEADGLKVNDK-VKEICDDTKRDGKCKELKDKVKKELETFKEELE — K 78 

Query: 72 QIKDLMDKDFDKLHEFVEIMKEMQKDMDEKMDILINTQKNYKLPLRRAPKEQQELRLMGK 131 

+KD+ D++ +K E +++E D D K + + + YKL +R E LR +GK 
Sbjct: 79 ALKDIKDENCEKYEEKCILLEETNHD-DVKKNCVKLREGCYKLKRKRVA-EDLLLRALGK 136 



Query: 132 THRE PQLRPKKMDGAS 147 

+ + K D S 
Sbjct: 137 DVKNGECEKKMKDVCS 152 



Pedant information for DKFZphfbr2_7j4, frame 3 



Report for DKFZphfbr2_7 j4 . 3 



[LENGTH) 233 

[MW) 26533.95 

tpl] 9.18 

[PROSITE] MYRISTYL 3 

[PROSITE) CK2_PHOSPHO_SITE 3 

[PROSITE J PKC_PHOSPHO_SITE 3 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 14.59 % 

[KW] COILED_COIL 13.73 % 



SEQ MSAKRAELKKTHLCKNYKAVCLELKPEPTKTFDYKAVKQEGRFTKAGVTQDLKNELREVR 

SEG xxxxxxxxx 

PRD ccchhhhhhhhhhccchhhhhhhcccccccccccceeecccccccccccchhhhhhhhhh 

COILS CCCCCCCCCCCC 



SEQ EELKEKMEEIKQIKDLMDKDFDKLHEFVEIMKEMQKDMDEKMDILINTQKNYKLFLRRAP 

SEG xxxxxxxxx xxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhchhhhhhhhhcccccccccccc 

COILS CCCCCCCCCCCCCCCCCCCC 

SEQ KEQQELRLMGKTHREPQLRPKKMDGASGVNGAPCALHKKTMAPQKTKQGSLDPLHHCGTC 

SEG 

PRD hhhhhhhhhccccccccccccccccccccccccchhhhhhcccccccccccccccccccc 

COILS 



SEQ CEKCLLCALKNNYNRGNIPSEASGLYKGGEEPVTTQPSVGHAVPAPKSQTEGR 

SEG 

PRD chhhhhhhccccccccccccccccccccccccccccccccccccccccccccc 

COILS 



Prosite for DKFZphfbr2_7 j4 . 3 



PS00005 




2->5 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


108- 


■>111 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


132- 


■>135 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


132- 


->136 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


179->183 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


228- 


■>232 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


151- 


->157 


MYRISTYL 




PDOC00008 


PS00008 


196->202 


MYRISTYL 




PDOC00008 


PS00008 


204- 


•>210 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKF2phfbr2_7 j4 .3) 
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DKFZphfbr2_82c20 



group: transmembrane protein 

DKFZphfbr2_82c20 encodes a novel 492 amino acid protein with very weak similarity to C. 
elegans cosmid D1007. 

The novel protein contains 7 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



similarity to C. elegans D1007.5 ; 
membrane regions : 7 

Summary DKFZphfbr2_82c20 encodes a novel 4 92 amino acid protein with 
similarity to a hypothetical C. elegans protein. 



similarity to C. elegans D1007.5 

complete cDNA (Bp 1-100 GC ritch), complete cds, 
potential start at Bp 128 matches Kozak consensus PyNNatgG, 
EST hits, localisation? primer B of STS doesn't match perfect! 
TRANSMEMBRANE 7 

Sequenced by DKFZ 

Locus: /map="109.9 cR from top of Chrl linkage group"??? 
Insert length: 1804 bp 

Poly A stretch at pos. 1794, no polyadenylation signal found 



1 CGGCGGGAGC GCGCGGCTGA TACCCGGGAC TGGGCTGCGG CGGTTAGTCC 
51 TCTCCCGGCC GCCGTCGCCT CCGACATATT GCTCGCAGGA GCTGCGGCGG 
101 CGAAGCGGAG AGCACCGGGG GGAGGAGATG GGAGGACGAA GAGGTCCCAA 
151 CAGGACATCT TACTGTCGAA ATCCGCTCTG TGAGCCGGGA TCCTCGGGGG 
201 GCTCTAGTGG AAGCCACACT TCCAGTGCAT CGGTGACCAG TGTTCGTTCC 
251 CGCACCAGGA GCAGTTCTGG AACAGGCCTC TCCAGCCCTC CTCTGGCCAC 
301 CCAAACTGTT GTGCCTCTAC AGCACTGCAA GATCCCCGAG CTGCCAGTCC 
351 AGGCCAGCAT TCTGTTTGAG TTGCAGCTCT TCTTCTGCCA GCTCATAGCA 
401 CTCTTCGTCC ACTACATCAA CATCTACAAG ACAGTGTGGT GGTATCCACC 
4 51 TTCCCACCCA CCCTCCCACA CCTCCCTGAA CTTCCATCTG ATCGACTTCA 
501 ACTTGCTGAT GGTGACCACC ATCGTTCTGG GCCGCCGCTT CATTGGGTCC 
551 ATCGTGAAGG AGGCCTCTCA GAGGGGGAAG GTCTCCCTCT TTCGCTCCAT 
601 CCTGCTGTTC CTCACTCGCT TCACCGTTCT CACGGCAACA GGCTGGAGTC 
651 TGTGCCGATC CCTCATCCAC CTCTTCAGGA CCTACTCCTT CCTGAACCTC 
701 CTGTTCCTCT GCTATCCGTT TGGGATGTAC ATTCCGTTCC TGCAGCTGAA 
751 TTGCGACCTC CGCAAGACAA GCCTCTTCAA CCACATGGCC TCCATGGGGC 
801 CCCGGGAGGC GGTCAGTGGC CTGGCAAAGA GCCGGGACTA CCTCCTGACA 
851 CTGCGGGAGA CGTGGAAGCA GCACACAAGA CAGCTGTATG GCCCGGACGC 
901 CATGCCCACC CATGCCTGCT GCCTGTCACC CAGCCTCATC CGCAGTGAGG 
951 TGGAGTTCCT CAAGATGGAC TTCAACTGGC GCATGAAGGA AGTGCTCGTC 
1001 AGCTCCATGC TGAGCGCCTA CTATGTGGCC TTTGTGCCTG TCTGGTTCGT 
1051 GAAGAACACA CATTACTATG ACAAGCGCTG GTCCTGTGAA CTCTTCCTGC 
1101 TGGTGTCCAT CAGCACCTCC GTGATCCTCA TGCAGCACCT GCTGCCTGCC 
1151 AGCTACTGTG ACCTGCTGCA CAAGGCCGCC GCCCATCTGG GCTGTTGGCA 
1201 GAAGGTGGAC CCAGCGCTGT GCTCCAACGT GCTGCAGCAC CCGTGGACTG 
1251 AAGAATGCAT GTGGCCGCAG GGCGTGCTGG TGAAGCACAG CAAGAACGTC 
1301 TACAAAGCCG TAGGCCACTA CAACGTGGCT ATCCCCTCTG ACGTCTCCCA 
1351 CTTCCGCTTC CATTTCTTTT TCAGCAAACC TCTGCGGATC CTCAACATCC 
1401 TCCTGCTGCT GGAGGGCGCT GTCATTGTCT ATCAGCTGTA CTCCCTAATG 
1451 TCCTCTGAAA AGTGGCACCA GACCATCTCG CTGGCCCTCA TCCTCTTCAG 
1501 CAACTACTAT GCCTTCTTCA AGCTGCTCCG GGACCGCTTG GTATTGGGCA 
1551 AGGCCTACTC ATACTCTGCT AGCCCCCAGA GAGACCTGGA CCACCGTTTC 
1601 TCCTGAGCCC TGGGGTCACC TCAGGGACAG CGTCCAGGCT TCAGCCAAGG 
1651 GCTCCCTGGC AAGGGGCTGT TGGGTAGAAG TGGTGGTGGG GGGGACAAAA 
1701 GACAAAAAAA TCCACCAGAG CTTTGTATTT TTGTTACGTA CTGTTTCTTT 
1751 GATAATTGAT GTGATAAGGA AAAAAGTCCT ATTTTTATAC TCCCAAAAAA 
1801 AAAA 



BLAST Results 



Entry HS285343 from database EMBL: 
human STS WI-17488. 
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Score = 1225, P - 1.3e-50, identities = 263/281 



Medline entries 



No Medline entry 



Peptide information for frame 2 



1 MGGRRGPNRT SYCRNPLCEP GSSGGSSGSH TSSASVTSVR SRTRSSSGTG 
51 LSSPPLATQT VVPLQHCKIP ELPVQASILF ELQLFFCQLI ALFVHYINIY 
101 KTVWWYPPSH PPSHTSLNFH LIDFNLLMVT TIVLGRRFIG SIVKEASQRG 
151 KVSLFRSILL FLTRFTVLTA TGWSLCRSLI HLFRTYSFLN LLFLCYPFGM 
201 YIPFLQLNCD LRKTSLFNHM ASMGPREAVS GLAKSRDYLL TLRETWKQHT 
251 RQLYGPDAMP THACCLSPSL IRSEVEFLKM DFNWRMKEVL VSSMLSAYYV 
301 AFVPVWFVKN THYYDKRWSC ELFLLVSIST SVILMQHLLP ASYCDLLHKA 
351 AAHLGCWQKV DPALCSNVLQ HPWTEECMWP QGVLVKHSKN VYKAVGHYNV 
401 AIPSDVSHFR FHFFFSKPLR ILNILLLLEG AVIVYQLYSL MSSEKWHQTI 
451 SLALILFSNY YAFFKLLRDR LVLGKAYSYS ASPQRDLDHR FS 

ORF from 128 bp to 1603 bp; peptide length: 492 
Category: similarity to unknown protein 
Prosite motifs: LEUCINE_ZIPPER (210-232) 
LEUCINE ZIPPER (210-232) 



BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_82c20, frame 2 

TREMBL : CEAF3151_8 gene: "D1007.5"; Caenorhabditis elegans cosmid 
D1007., N — 2, Score - 247, P - 4.6e-29 

>TREMBL:CEAF3151_8 gene: "D1007.5"; Caenorhabditis elegans cosmid D1007. 
Length = 512 

HSPs: 

Score « 247 (37.1 bits), Expect * 4.6e-29, Sum P(2) = 4.6e-29 
Identities =* 58/204 (28%), Positives « 102/204 (50%) 

VSSMLSAYYVAFVPVWFVKNTHYYDKRWSCELFLLVSISTSVILMQHLLPASYCDLLHKA 350 
+S ML +V F + ++ W C+L ++V ++ + + +L P +Y DLLH+A 

LSIMLPCIFVPFKTSQGIPQKILINEVWECQLAI VVGLTAFSLYVAYLSPLNYLDLLHRA 358 

AAHLGCWQKVD-PAL CSNVLQHPWTEECMWPQGVLVKHSKN- VYKAVGHYNV 400 

A HLG W +++ P + + PW+E C++ G V+ Y+A ++ 



A P H F KP ++NI+ E +1 Q + L+ + W ++ L++F+NY 

AHPESSRHNTFFKVLRKPNNLINIMCSFEFLLIFIQFWMLVLTNDWQHIVTFVLLMFANY 

YAFFKLLRDRLVLGKAYSYSASPQRDL 487 

F KL +D+++L + Y S Q DL 
LLFAKLFKDKIILSRI YEPS QEDL 502 

(26.7 bits), Expect = 4.3e-21, Sum P(2) =* 4.3e-21 



Query: 


291 


Sbjct: 


299 


Query: 


351 


Sbjct: 


359 


Query: 


401 


Sbjct: 


419 


Query: 


461 


Sbjct: 


479 


Score 


- 178 


Identities : 


Query: 


262 


Sbjct: 


262 


Query: 


318 


Sbjct: 


322 


Query: 


369 


Sbjct: 


382 



W C+L ++V ++ + + +L P +Y DLLH+AA HLG W +++ P + 



PW+E C++ G V+ Y+A ++ + + R + FF K LR N L+ 
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Score » 146 (21.9 bits), Expect - 4.6e-29, Sum P(2) = 4.6e-29 
Identities - 34/86 (39%), Positives = 50/86 (58%) 

Query: 52 SSPPLATQTVVPLQHCKIPELP-VQASILFELQLFFCQLIALFVHYINI YKTVWWYPPSH 110 

+S P A+ + + H P++ Q + FE LF ++ALF+ Y+NIYKT+WW P S+ 
Sbjct: 19 ASIPRASGVTLSV-HPIWPDIQFTQGELFFECTLFLYSVLALFLQYLNIYKTLWWLPKSY 77 

Query: 111 PPSHTSLNFHLIDFNLLMVTTIVLGRR 137 

H SL FHLI+ L ++LG R 
Sbjct: 78 — WHYSLKFHLINPYFLSCVGLLLGWR 102 

Score = 39 (5.9 bits), Expect = 6.8e-18, Sum P(2) = 6.8e-18 
Identities = 12/41 (29%), Positives = 20/41 (48%) 

Query: 154 LFRSILLFLTRFTVLTATGWSLCRSLIHLFRTYSFLNLLFL 194 

L+ + LFL ++ + T W L +S H + +N FL 
Sbjct: 53 LYSVLALFL-QYLNIYKTLWWLPKSYWHYSLKFHLINPYFL 92 

Pedant information for DKFZphfbr2_82c20, frame 2 

Report for DKFZphfbr2_82c20 .2 

[ LENGTH ] 492 

[MW] 56274.05 

(pi) 9.51 

[HOMOL] TREMBL : CEAF3 1 5 1_8 gene: "01007.5"; Caenorhabditis elegans cosmid D1007. 4e 

[PROSITE] LEUCINE_ZIPPER 1 

[PROSITE] AMIDATION 2 

[PROSITE] MYRISTYL 5 

(PROSITE] CAMP_PHOSPHO_SITE 2 

(PROSITE] CK2_PHOSPHO_SITE 3 

(PROSITE] GLYCOSAMINOGLYCAN 1 

[PROSITE] PKC_PHOSPHO_SITE 5 

[PROSITE] ASN_GLYCOSYLATION 1 

[KW] TRANSMEMBRANE 7 

[KW] LOW_COMPLEXITY 8.74 % 

SEQ MGGRRGPNRTSYCRNPLCEPGSSGGSSGSHTSSASVTSVRSRTRSSSGTGLSSPPLATQT 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccceeeccccccccccccccccccee 
MEM 



SEQ WPLQHCKIPELPVQASILFELQLFFCQLIALFVHYINIYKTVWWYPPSHPPSHTSLNFH 

SEG 

PRD eeeccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhheeeeeccccccccceeeeee 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMM 

SEQ LIDFNLLMVTTIVLGRRFIGSIVKEASQRGKVSLFRSILLFLTRFTVLTATGWSLCRSLI 

SEG 

PRD eeehhhhhhhhhhhhheeeehhhhhhhcccchhhhhhhhhhhhhhhhhhcccchhhhhhh 

MEM MMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ HL FRT Y S FLN LL FLC Y P FGMY I P FLQLNC DL RKT S L FN HMA SMG P REA VSGLAKSRDYLL 

SEG 

PRD hhhhhhhhheeeeeeecccccceeeeccccchhhhhhhhhhccchhhhhhhhhhhhhhhh 

MEM 

SEQ TLRETWKQHTRQLYGPDAMPTHACCLSPSLIRSEVEFLKMDFNWRMKEVLVSSMLSAYYV 

SEG 

PRD hhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhcchhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ AFVPVWFVKNTHYYDKRWSCELFLLVSI STSVILMQHLLPAS YCDLLHKAAAHLGCWQKV 

SEG 

PRD heeeeeeeeccccccchhhhhhhhhhhcchhhhhhhhhhccchhhhhhhhhhhhhhhccc 

MEM MMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ DPALCSNVLQHPWTEECMWPQGVLVKHSKNVYKAVGHYNVAIPSDVSHFRFHFFFSKPLR 

SEG xx 

PRD ccccccccccccccceeecccceeeeeccceeeeccccccccccccccceeeeeecccch 

MEM MMMMMMMMMM 

SEQ ILNILLLLEGAVIVYQLYSLMSSEKWHQTISLALILFSNYYAFFKLLRDRLVLGKAYSYS 

SEG xxxxxxxx 

PRD hhhhhhhhhhheeeeehhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccc 

MEM MMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM 
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SEQ ASPQRDLDHRFS 

SEG 

PRD ccchhhhhhccc 

MEM 



Prosite for DKFZphfbr2_82c20.2 



PS00001 


8->12 


ASN GLYCOSYLATION 


PDOC00001 


PS00002 


47->51 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00004 


212->216 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


316->320 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


38->41 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


147->150 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


241->244 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


245->248 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


443->446 


PKC PHOSPHO~SITE 


PDOC00Q05 


PS00006 


241->245 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


273->277 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


342->346 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00008 


21->27 


MYRISTYL 


PDOC00008 


PS00008 


24->30 


MYRISTYL 


PDOC00008 


PS00008 


28->34 


MYRISTYL 


PDOC00008 


PS00008 


48->54 


MYRISTYL 


PDOC00008 


PS00008 


231->237 


MYRISTYL 


PDOC00008 


PS00009 


2->6 


AMI DAT ION 


PDOC00009 


PS00009 


134->138 


AMI DAT ION 


PDOC00009 


PS00029 


168->190 


LEUCINE ZIPPER 


PDOC00029 



(No Pfara data available for DKFZphfbr2_82c20.2) 
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DKFZphfbr2_82el7 



group: transmembrane protein 

DKFZphfbr2_82el7 encodes a novel 311 amino acid protein with very weak similarity to C. 
elegans cosmid R01B10. 

The novel protein contains 6 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



similarity to C. elegans "R01B10.5" ; 
membrane regions: 6 

Summary DKFZphfbr2_82el7 encodes a novel 311 amino acid protein with 
similarity to a hypothetical C. elegans protein. 



similarity to C. elegans "R01B10.5" 

complete cDNA, EST HS763158 extendes the sequence, complete cds, EST 
hits 

six potential transmembrane domains 
Sequenced by DKFZ 

Locus: /map= w 779_C_?; 818_A_1; 877_C_1; 734_C_12; 760_E_11; 171.7 cR from top of Chrl4 linkage 
group" 

Insert length: 1618 bp 

Poly A stretch at pos. 1608, polyadenylation signal at pos. 1588 



1 CTGATCTAGT GCTTCTCGAA AAAAACCTTC AGGCGGCCCA TGGCTGTCGA 
51 TATTCAACCA GCATGCCTTG GACTTTATTG TGGGAAGACC CTATTATTTA 
101 AAAATGGCTC AACTGAAATA TATGGAGAAT GTGGGGTATG CCCAAGAGGA 
151 CAGAGAACGA ATGCACAGAA ATATTGTCAG CCTTGCACAG AATCTCCTGA 
201 ACTTTATGAT TGGCTCTATC TTGGATTTAT GGCAATGCTT CCTCTGGTTT 
251 TACATTGGTT CTTCATTGAA TGGTACTCGG GGAAAAAGAG TTCCAGCGCA 
301 CTTTTCCAAC ACATCACTGC ATTATTTGAA TGCAGCATGG CAGCTATTAT 
351 CACCTTACTT GTGAGTGATC CAGTTGGTGT TCTTTATATT CGTTCATGTC 
401 GAGTATTGAT GCTTTCTGAC TGGTACACGA TGCTTTACAA CCCAAGTCCA 
451 GATTACGTTA CCACAGTACA CTGTACTCAT GAAGCCGTCT ACCCACTATA 
501 TACCATTGTA TTTATCTATT ACGCATTCTG CTTGGTATTA ATGATGCTGC 
551 TCCGACCTCT TCTGGTGAAG AAGATTGCAT GTGGGTTAGG GAAATCTGAT 
601 CGATTTAAAA GTATTTATGC TGCACTTTAC TTCTTCCCAA TTTTAACCGT 
651 GCTTCAGGCA GTTGGTGGAG GCCTTTTATA TTACGCCTTC C CAT AC ATT A 
701 TATTAGTGTT ATCTTTGGTT ACTCTGGCTG TGTACATGTC TGCTTCTGAA 
751 ATAGAGAACT GCTATGATCT TCTGGTCAGA AAGAAAAGAC TTATTGTTCT 
801 CTTCAGCCAC TGGTTACTTC ATGCCTATGG AATAATCTCC ATTTCCAGAG 
851 TGGATAAACT TGAGCAAGAT TTGCCCCTTT TGGCTTTGGT ACCTACACCA 
901 GCCCTTTTTT ACTTGTTCAC TGCAAAATTT ACCGAACCTT CAAGGATACT 
951 CTCAGAAGGA GCCAATGGAC ACTGAGTGTA GACATGTGAA ATGCCAAAAA 
1001 CCTGAGAAGT GCTCCTAATA AAAAAGTAAA TCAATCTTAA CAGTGTATGA 
1051 GAACTATTCT ATCATATATG GGAACAAGAT TGTCAGTATA TCTTAATGTT 
1101 TGGGTTTGTC TTTGTTTTGT TTATGGTTAG ACTTACAGAC TTGGAAAATG 
1151 CAAAACTCTG TAATACTCTG TTACACAGGG TAATATTATC TGCTACACTG 
1201 GAAGGCCGCT AGGAAGCCCT TGCTTCTCTC AACAGTTCAG CTGTTCTTTA 
1251 GGGCAAAATC ATGTTTCTGT GTACCTAGCA ATGTGTTCCC ATTTTATTAA 
1301 GAAAAGCTTT AACACGTGTA ATCTGCAGTC CTTAACAGTG GCGTAATTGT 
1351 ACGTACCTGT TGTGTTTCAG TTTGTTTTTC ACCTATAATG AATTGTAAAA 
1401 AC AA AC AT AC TTGTGGGGTC TGATAGCAAA CATAGAAATG ATGTATATTG 
1451 TTTTTTGTTA TCTATTTATT TTCATCAATA CAGTATTTTG ATGTATTGCA 
1501 AAAATAGATA ATAATTTATA TAACAGGTTT TCTGTTTATA GATTGGTTCA 
1551 AGATTTGTTT GGATTATTGT TCCTGTAAAG AAAACAATAA TAAAAAGCTT 
1601 ACCTACATAA AAAAAAAA 



BLAST Results 



Entry HS981146 from database EMBL: 
human STS WI-6253. 
Length = 208 
Minus Strand HSPs: 

Score = 1040 (156.0 bits), Expect = 1.9e-40, P = 1.9e-40 
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Identities « 208/208 (100%), Positives * 208/208 (100%), Strand = Minus 
/ Plus 

Entry HSG20716 from database EMBL: 
human STS A006D06. ' 
Length = 195 
Minus Strand HSPs: 

Score » 975 (146.3 bits), Expect = 1.8e-37, P = 1.8e-37 

Identities - 195/195 (100%), Positives = 195/195 (100%), Strand - Minus 

/ Plus 



Medline entries 



No Medline entry 



Peptide information for frame 1 



1 MAVDIQPACL GLYCGKTLLF KNGSTEIYGE CGVCPRGQRT NAQKYCQPCT 

51 ESPELYDWLY LGFMAMLPLV LHWFFIEWYS GKKSSSALFQ HITALFECSM 

101 AAIITLLVSD PVGVLYIRSC RVLMLSDWYT MLYNPSPDYV TTVHCTHEAV 

151 YPLYTIVFIY YAFCLVLMML LRPLLVKKIA CGLGKSDRFK SIYAALYFFP 

201 ILTVLQAVGG GLLYYAFPYI ILVLSLVTLA VYMSASEIEN CYDLLVRKKR 

251 LIVLFSHWLL HAYGIISISR VDKLEQDLPL LALVPTPALF YLFTAKFTEP 

301 SRILSEGANG H 



ORF from 40 bp to 972 bp; peptide length: 311 
Category: similarity to unknown protein 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_82el7 , frame 1 

TREMBL : AF0687 18 5 gene: "R01B10.5"; Caenorhabditis elegans cosmid 
R01B10., N = 1, "Score = 399, P = 1.4e-36 



>TREMBL:AF068718_5 gene: "R01B10.5"; Caenorhabditis elegans cosmid R01B10. 
Length = 670 

HSPs: 

Score = 399 (59.9 bits), Expect = 1.4e-36, P = 1.4e-36 
Identities = 95/280 (33%), Positives » 152/280 (54%) 

Query: 2 AVDIQPACLGLYCGKTLLFKN GSTEIYGECGVCPRGQRTNAQKYCQPC 4 9 

A IQP+CLG +CG+T+L N GST + CG C G R NA C+ C 

Sbjct: 292 ASTIQPSCLG-FCGRTVLVGNYSEDVEATTTAAGSTSL-SRCGPCSFGYRNNAMSICESC 349 

Query: 50 TESPELYDWLYLGFMAMLPLVLHWFFIEWYSGKKSSSALFQ HITALFECSMAAIITL 106 

+ YDW+YL F+A+LPL+LH FI + K + ++ ++ + E +A +1 + 
Sbjct: 350 DTPLQPYDWMYLLFIALLPLLLHMQFIR-IARKYCRTRYYEVSEYLCVILENVIACVIAV 408 

Query: 107 LVSDPVGVLYIRSCRVLMLSDWYTMLYNPSPDYVTTVHCTHEAVYPLYTIVFIYYAFCLV 166 

L+ P ++ C + +WY YNP Y T+ CT+E V+PLY+I FI++ + 
Sbjct: 409 LIYPPRFTFFLNGCSKTDIKEWYPACYNPRIGYTKTMRCTYEVVFPLYSITFIHHLILIG 468 

Query: 167 LMMLLRPLLVKKIACGLGKSDRFKSIYAALYFFPILTVLQAVGGGLLYYAFPYIILVLSL 226 

+++LR L + L K+ K YAA+ PIL V+ AV G+++Y FPYI+L+ SL 
Sbjct: 469 SILVLRSTLYCVL LYKTYNGKPFYAAIVSVPILAVIHAVLSGVVFYTFPYILLIGSL 525 

Query: 227 VTLAVYMSASEI ENCYDLLVR KKRLI V L FS HWLLH A YG 1 1 S I 268 

+ +++ +++VR LI L L+ ++G+I+I 

Sbjct: 526 WAMCFHLALEGKRPLKEMIVRIATSPTHLIFLSITMLMLSFGVIAI 571 



Pedant information for DKFZphfbr2_82el7, frame 1 



Report for DKFZphfbr2_82el7 . 1 
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[LENGTH] 


311 






35239 . 14 




[pi] 


7.91 




[HOMOL] 


TREMBL:AF068718_5 


gene: "R01B10.5"; Caenorhabditis elegans cosmid R01B10. 9e-36 


[PROSITE] 


AMIDATION 1 




[PROSITE] 


MYRISTYL 3 




[PROSITE] 


CAMP PHOSPHO SITE 


1 


[PROSITE] 


CK2 PHOSPHO SITE 


3 


[PROSITE] 


PKC~PHOSPHO SITE 


4 


[PROSITE] 


ASN GLYCOSYLATION 


1 


[KW] 


TRANSMEMBRANE 6 




[KW] 


LOW COMPLEXITY 


7.72 % 



SEQ MAVDIQPACLGLYCGKTLLFKNGSTEIYGECGVCPRGQRTNAQKYCQPCTESPELYDWLY 

SEG 

PRO cccccccccccccccceeeeccccceeecccccccccccccceeecccccccccchhhhh 

MEM MMMMMM 

SEQ LGFMAMLPLVLHWFFIEWYSGKKSSSALFQHITALFECSMAAIITLLVSDPVGVLYIRSC 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhccccceeeeeece 

MEM MMMMMMMMMMMMMMMM ••••..*••••• MMMMMMMMMMMMMMMMMMMMMMMMMMMMM ... 

SEQ RVLMLSDWYTMLYNPSPDYVTTVHCTHEAVYPLYTIVFI YYAFCLVLMMLLRPLLVKKIA 

SEG xxxxxxxxxxxx. . . . 

PRD eeeeecceeeeecccccceeeeeeeceeeeeeeeceeeeehhhhhhhhhhhhhhhhhhee 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM. . . 

SEQ CGLGKSDRFKSIYAALYFFPILTVLQAVGGGLLYYAFPYIILVLSLVTLAVYMSASEIEN 

SEG 

PRD eecccccchhhhhhhhhhhccccccccccccceeeecceeeeehhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ CYDLLVRKKRLIVLFSHWLLHAYGIISISRVDKLEQDLPLLALVPTPALFYLFTAKFTEP 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhcccceeeechhhhhhceeeeeecccceeeeeeeccccc 

MEM MMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM 

SEQ SRILSEGANGH 

SEG 

PRD ceeeeeccccc 

MEM MM 



Prosite for DKFZphfbr2_82el7 . 1 



PS00001 


22->26 


PS0O004 


82->86 


PS00005 


80->83 


PS00005 


119->122 


PS00005 


186->189 


PS00005 


294->297 


PS00006 


234->238 


PS00006 


236->240 


PS00006 


269->273 


PS00008 


11->17 


PS00008 


37->43 


PS00008 


182->188 


PS00009 


80->84 



ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOS PHO^S I TE 

PKC PHOSPHO SITE 

PKC~PHOSPHO"SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMIDATION 



PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 



(No Pfam data available for DKFZphfbr2_82el7 . 1) 
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DKFZphfbr2_82e4 



group: signal transduction 

DKFZphfbr2_82e4 encodes a novel 473 amino acid protein with strong similarity to the 
calmodulin-binding proteins. 

The novel protein is similar to human and rat Ca2+/calmodulin-dependent protein kinase (EC 
2.7.1.123), rat calmodulin-binding protein, calmodulin binding protein kinase of Fugu rupie 
and Rattus norvegicus calcium/calmodulin-dependent protein kinase I. Calmodulin is the 
archetype of the family of calcium-modulated proteins of which nearly 20 members have been 
found. Calmodulin is involved in regulation of growth and cell cycle as well as in signal 
transduction and the synthesis and release of neurotransmitters. The novel protein seems to 
involved in calmodulin-mediated pathways in human neuronal cells . 

The new protein can find clinical application in modulating/blocking calmodulin-mediated 
pathways in human neuronal cells. 



strong similarity to calmodulin-binding proteins 

complete cDNA, complete cds, EST hits 
splice variant in comparison to rat 156542 
ESTs HSZZ54543/HS1141907 define splice variant 
see also DKFZphfbr2_82g20 unspliced form 

Sequenced by DKFZ 

Locus: /map-"200.5 cR from top of Chr3 linkage group" 
Insert length: 2923 bp 

Poly A stretch at pos. 2913, polyadenylation signal at pos. 2890 



1 ATGCTGGAGG TTCGCTAGCC GAAGCGGCTG CATCTGGCGC CGCGTCTGCC 
51 CCGCGTGCTC GGAGCGGATT CTGCCCGCCG TCCCCGGAGC CCTCGGCGCC 
101 CCGCTGAGCC CGCGATCACT TCCTCCCTGT GACCAACCGG CGCTGCAGGT 
151 TAGAGCCTGG CAATGCCGTT TGGGTGTGTG ACTCTGGGTG ACAAGAAGAA 
201 CTATAACCAG CCATCGGAGG TGACTGACAG ATATGATTTG GGACAGGTCA 
251 TCAAGACTGA GGAGTTTTGT GAAATCTTCC GGGCCAAGGA CAAGACGACA 
301 GGCAAGCTGC ACACCTGCAA GAAGTTCCAG AAGCGGGACG GCCGCAAGGT 
351 GCGGAAAGCT GCCAAGAACG AGATAGGCAT CCTCAAGATG GTGAAGCATC 
4 01 CCAACATCCT ACAGCTGGTG GATGTGTTTG TGACCCGCAA GGAGTACTTT 
4 51 ATCTTCCTGG AGCTGGCCAC GGGGAGGGAG GTGTTTGACT GGATCCTGGA 
501 CCAGGGCTAC TACTCGGAGC GAGACACAAG CAACGTGGTA CGGCAAGTCC 
551 TGGAGGCCGT GGCCTATTTG CACTCACTCA AGATCGTGCA CAGGAATCTC 
601 AAGCTGGAGA ACCTGGTTTA CTACAACCGG CTGAAGAACT CGAAGATTGT 
651 CATCAGTGAC TTCCATCTGG CTAAGCTAGA AAATGGCCTC ATCAAGGAGC 
701 CCTGTGGGAC CCCCGAGTAT CTGGGCAACC CACCTTTCTA TGAGGAGGTG 
751 GAAGAAGATG ATTATGAGAA CCATGATAAG AATCTCTTCC GCAAGATCCT 
801 GGCTGGTGAC TATGAGTTTG ACTCTCCATA TTGGGATGAT ATTTCGCAGG 
851 CAGCCAAAGA CCTGGTCACA AGGCTGATGG AGGTGGAGCA AGACCAGCGG 
901 ATCACTGCAG AAGAGGCCAT CTCCCATGAG TGGATTTCTG GCAATGCTGC 
951 TTCTGATAAG AACATCAAGG ATGGTGTCTG TGCCCAGATT GAAAAGAACT 
1001 TTGCCAGGGC CAAGTGGAAG AAGGCTGTCC GAGTGACCAC CCTCATGAAA 
1051 CGGCTCCGGG CACCAGAGCA GTCCAGCACG GCTGCAGCCC AGTCGGCCTC 
1101 AGCCACAGAC ACTGCCACCC CCGGGGCTGC AGGTGGGGCC ACAGCTGCAG 
1151 CTGCGAGTGG AGCTACCTCA GCCCCTGAGG GTGATGCTGC TCGTGCTGCA 
1201 AAGAGTGATA ATGTGGCCCC CGCAGACCGT AGTGCCACCC CAGCCACAGA 
1251 TGGAAGTGCC ACCCCAGCCA CTGATGGCAG TGTCACCCCA GCCACCGATG 
1301 GAAGCATCAC TCCAGCCACT GATGGGAGTG TCACCCCAGC CACTGACAGG 
1351 AGCGCTACTC CAGCCACTGA TGGGAGAGCC ACACCAGCCA CAGAAGAGAG 
1401 CACTGTGCCC ACCACCCAAA GCAGTGCCAT GCTGGCCACC AAGGCAGCTG 
14 51 CCACCCCTGA GCCGGCTATG GCCCAGCCGG ACAGCACAGC CCCAGAGGGC 
1501 GCCACAGGCC AGGCTCCACC CTCTAGTAAA GGGGAAGAGG CTGCTGGTTA 
1551 TGCCCAGGAG TCTCAAAGGG AGGAGGCCAG CTGAGTAGGC AGCCTGGTGA 
1601 GGGGGGGCAG GGGATGGGCA GGAGGGTGGG AGAGTGGATG AGGGGCTTCT 
1651 CACTGTACAT AGAGTCACTG GCATGATGCC CTCGCTCCCC CATGCCCCCA 
1701 CATCCCAGTG GGGCATAACT AGGGGTCACG GGAGAGCAGT CTCGTCTCCT 
1751 GTGTGTATGT GTGTGAGTGG TGGGCAGGCC AGTGGCAGGG CCGGCCCCAG 
1801 CCCCTGCATG GATTCCTTGT GGCTTTTCTG TCTTTTGCTA GCTTCACCAG 
1851 TTTCTGTTCC TTGTGGGATG CTGCTCTAGG GATACTCAGG GGGCTCCTGC 
1901 TCTCCTTCCC CTTCCCTTCT TGCCTCACCA TTCCCCTAGG CAGGCCCTGC 
1951 AGGTCCCACA CTCTCCCAGG CCCTAAACTT GGGCGGCCTT GCCCTGAGAG 
2001 CTGGTCCTCC AGCGAGGCCC TGTCAGCGGT CTTAGGCTCC TGCACATGAA 
2051 GGTGTGTGCC TGTGGTGTGT GGGCTGCTCT AGGAGCAGAT ACAGGCTGGT 
2101 ATAGAGGATG CAGAAAGGTA GGGCAGTATG TTTAAGTCCA GACTTGGCAC 
2151 ATGGCTAGGG ATACTGCTCA CTAGCTGTGG AGGTCCTCAG GAGTGGAGAG 
2201 AATGAGTAGG AGGGCAGAAG CTTCCATTTT TGTCCTTCCT AAGACCCTGT 



352 



WO 01/12659 



PCT/IB00/01496 



2251 TATTTGTGTT ATTTCCTGCC TTTCCGAGTC CTGCAGTGGG CTGCCCTGTA 
2301 CCCTGAACCT CATGAGCCTC TAAGGGAAAG GAGGAACAAT TAGGACGTGG 
2351 CAATGAGACC TGGCAGGGCA GAGTACAAGC CCAGCACCCA GTGTCCCAGC 
2401 CTTACTGGGT CCTTACCCTG GGCCAAACAG GGAGGGCTGA TACCTCCTTG 
2451 CTCTTCCTAG ATGCCCACCT CCTACAATCT CAGCCCACAA GTCCTCTCCA 
2501 CCCTAGGGGG CTTGCTGCAT GGCAATAACT CATAATCTGA TTTGGAGGTT 
2551 TGCCCTTTAC AGGGGCAGAT TTTCTGCTCA GTTCAACAAT GAAATGAAGA 
2601 GGAACTCCCT CTTTCTACAG CTCACTTCTA TCAGAGGCCC AGGTGCCTCA 
2651 GAGCCACATT GAGTTGCTTT TTCTGGGATG AGGAAGTAGG GTTAAACTCC 
2701 CCAGTTTCCT GAGGGAGGCT CCTGACAGGT GCCCTTTGTC AGACCCTACC 
2751 ACAGCCTGGA TAGGCAGCCA CATTGGTCCT CGCCCTTGCT CGGCACTCCG 
2801 TGGTGGTCCT GCCCTTCTCC CTGCATGCCT GTGGGTCTGC TCTGGTGTGT 
2851 GAAGGTCGGT GGGTTAACTG TGTGCCTACT GAACCTGGCA AATAAACATC 
2901 ACCCTGCAAA GCCAAAAAAA AAA 



BLAST Results 



Entry HS4 52352 from database EMBL: 
human STS WI-1531B. 
Length = 350 
Minus Strand HSPs: 

Score = 1547 {232.1 bits), Expect - 5.2e-63, P = 5.2e-63 

Identities « 331/348 (95%), Positives = 331/348 (95%), Strand = Minus / 

PI 



Medline entries 



94110847: 

J Neurosci 1994 Jan; 14 ( 1) : 1-13 

1G5: a calmodulin-binding, vesicle-associated, protein 
kinase-like protein enriched in forebrain neurites. 

Godbout M, Erlander MG, Hasel KW, Danielson PE, Wong KK, Battenberg EL, 
Foye PE, 

Bloom FE, Sutcliffe JG 



Peptide information for frame 1 



1 MPFGCVTLGD KKNYNQPSEV TDRYDLGQVI KTEEFCEIFR AKDKTTGKLH 

51 TCKKFQKRDG RKVRKAAKNE IGILKMVKHP NILQLVDVFV TRKEYFIFLE 

101 LATGREVFDW ILDQGYYSER DTSNVVRQVL EAVAYLHSLK IVHRNLKLEN 

151 LVYYNRLKNS KIVISDFHLA KLENGLIKEP CGTPEYLGNP PFYEEVEEDD 

201 YENHDKNLFR KILAGDYEFD SPYWDDISQA AKDLVTRLME VEQDQRITAE 

251 EAISHEWISG NAASDKNIKD GVCAQIEKNF ARAKWKKAVR VTTLMKRLRA 

301 PEQSSTAAAQ SASATDTATP GAAGGATAAA ASGATSAPEG DAARAAKSDN 

351 VAPADRSATP ATDGSATPAT DGSVTPATDG SITPATDGSV TPATDRSATP 

401 ATDGRATPAT EESTVPTTQS SAMLATKAAA TPEPAMAQPD STAPEGATGQ 

451 APPSSKGEEA AGYAQESQRE EAS 



ORF from 163 bp to 1581 bp; peptide length: 473 
Category: strong similarity to known protein 



BLAST P hits 



Entry S50193 from database PIR: 

Ca2+/calmodulin-dependent protein kinase (EC 2.7.1.123) I - rat 
Length * 374 

Score = 371 (130.6 bits), Expect » 2.2e-66, Sum P(2) - 2.2e-66 
Identities » 74/176 (42%), Positives = 115/176 (65%) 

Entry S57347 from database PIR: 

Ca2+/calmodulin-dependent protein kinase (EC 2.7.1.123) I - human 
Length = 370 

Score - 369 (129.9 bits), Expect » 4.6e-66, Sum P(2) = 4.6e-66 ■ 
Identities - 74/176 (42%), Positives = 114/176 (64%) 



Alert BLASTP hits for DKFZphfbr2_82e4, frame 1 

PIR: 156542 calmodulin-binding protein - rat, N ■ 2, Score = 1246, P - 
4e-228 
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TREMBLNEW: FRU010348_3 product: "calmodulin binding protein kinase"; 
Fugu rubripes UBEl-like gene, PRGFR2 gene and gene encoding calmodulin 
binding protein kinase, clone 168J21, N = 2, Score « 846, P « 2.6e-139 

TREMBL : RNPRKI_1 product: "protein kinase I"; Rattus norvegicus 
calcium/calmodulin-dependent protein kinase I mRNA, complete cds., N « 
2, Score = 364, P = 5.1e-63 



>PIR: 156542 calmodulin-binding protein - rat 
Length - 504 

HSPs: 

Score = 1246 (186.9 bits), Expect - 4.0e-228, Sum P(2) « 4.0e-228 
Identities - 255/289 (88%), Positives = 259/289 (89%) 



GNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVTRLMEVEQDQRI 



TAEEAISHEWISGNAASDKNIKDGVCAQIEKNFARAKWKKAVRVTTLMKRLRAPEQS TA 



Query: 


188 


Sbjct: 


216 


Query: 


248 


Sbjct: 


276 


Query: 


308 


Sbjct: 


336 


Query: 


360 


Sbjct: 


391 


Query: 


420 


Sbjct: 


451 


Score 


= 978 


Identities 1 


Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 



AT DT AT PG AAGGAT AAAAS GAT S A P E GDAARAAKSDNVAPADRSAT 359 

+D ATPGAAGGA AAAA GA A GDA AAKSD++A ADRSAT 

-SDAATPGAAGGAVAAAAGGAAPASGASATVGTGGDAGCAAKSDDMASADRSAT 390 



PAT DGS AT PAT DGSVTPATDGSIT PAT DG S VT PAT DRSATPATDGRAT P ATEEST V P Q 



SSA A KAAATPEPA+AQPDSTA EGATGQAPPSSKGEEA G AQESQR E S 



186/187 (99%), Positives * 187/187 (100%) 



4.0e-228 



MPFGCVTLGDKKNYNQPSEVTDRYDLGQV+KTEEFCEIFRAKDKTTGKLHTCKKFQKRDG 



RKVRKAAKNEIGILKMVKHPNILQLVDVFVTRKEYFIFLELATGREVFDWILDQGYYSER 



DTSNVVRQVLEAVAYLHSLKIVHRNLKLENLVYYNRLKNSKIVISDFHLAKLENGLIKEP 



CGTPEYL 



Pedant information for DKFZphfbr2_82e4, frame 1 
Report for DKFZphfbr2_82e4 . 1 



( LENGTH ] 473 

IMWJ 51208.89 

tpl} 5.30 

[HOMOLJ PIR: 156542 calmodulin-binding protein - rat 0.0 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YFR014c) 4e-30 

[FUNCAT] 10.99 other signal-transduction activities (S. cerevisiae, Y FRO 14c] 4e-30 

( FUNCAT] 03.01 cell growth [S. cerevisiae, YFR014c] 4e-30 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YKLlOlw] 2e-26 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YKLlOlw] 2e-26 

[FUNCAT] 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YDLlOlc] 8e-26 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YCL024w] 5e-24 

[FUNCAT] 03.25 cytokinesis [S. cerevisiae, YDR507c] 7e-23 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YDR507c] 
7e-23 

[FUNCAT] 03.22.01 cell cycle check point proteins (S. cerevisiae, YPL153c] le-21 

[FUNCAT] 03.19 recombination and dna repair IS. cerevisiae, YPL153c] le-21 
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IFUNCAT] 11.01 stress response (S. cerevisiae, YDR477wJ 3e-l9 

[FUNCAT] 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YDR477w] 

3e-19 

( FUNCAT] 99 unclassified proteins (S. cerevisiae, YPL141c] le-16 

IFUNCAT] 03.16 dna synthesis and replication [S. cerevisiae, YMROOlc] 3e-16 

[FUNCAT] 03.13 meiosis [S. cerevisiae, YOR351c] le-15 

[FUNCAT] 30.02 organization of plasma membrane [S. cerevisiae, YDR122w] 3e-14 

( FUNCAT ] 10.03.11 key kinases [S. cerevisiae, YCR073c) 6e-ll 

[FUNCAT] 09.01 biogenesis of cell wall [S. cerevisiae, YNR031c] 8e-ll 

[FUNCAT] 10.02.11 key kinases [S. cerevisiae, YJL095w] 2e-09 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YLR362w] le-08 

[FUNCAT] 10.05.11 key kinases [S. cerevisiae, YLR362w] le-08 

[FUNCAT] 10.04.11 key kinases [S. cerevisiae, YLR362w] le-08 

[FUNCAT] 02.19 metabolism of energy reserves (glycogen, trehalose) [S. cerevisiae, 
YPL031C] 7e-08 

t FUNCAT ] 04.05.01.04 transcriptional control [S. cerevisiae, YPL031c] 7e-08 

[FUNCAT] 01.04.04 regulation of phosphate utilization [S. cerevisiae, YPL031c) 

7e-08 

IFUNCAT] 06.07 protein modification (glycolsylation, acylation, myristylation, 
palmitylation, farnesylation and processing) [S. cerevisiae, YFL033c] le-07 

[FUNCAT] 04.99 other transcription activities {S. cerevisiae, YFL033c) le-07 

[FUNCAT] 10.05.09 regulation of g-protein activity [S. cerevisiae, YBL016w] 5e-07 

[FUNCAT] 05.07 translational control [S. cerevisiae, YDR283c] 8e-07 

[FUNCAT] 01.06.10 regulation of lipid, fatty-acid and sterol biosynthesis [S. 
cerevisiae, YHR079c] 5e-06 

[ FUNCAT ] 30.07 organization of endoplasmatic reticulum [S. cerevisiae, YHR079c] 

5e-06 

[FUNCAT] 30.01 organization of cell wall [S. cerevisiae, YIR019c) le-05 

[ FUNCAT ) 30.90 extracellular/secretion proteins [S. cerevisiae, YIR019c] le-05 

[FUNCAT] 01.05.01 carbohydrate utilization [S. cerevisiae, YIR019c] le-05 

[FUNCAT] 04.05.01.01 general transcription activities [S. cerevisiae, YDL108w] 

le-05 

[FUNCAT] 01.02.04 regulation of nitrogen and sulphur utilization [S. cerevisiae, 
YNL183C] 8e-05 

[FUNCAT] 08.99 other intracellular-transport activities [S. cerevisiae, YNL183c] 

8e-05 

[FUNCAT] 03.10 sporulation and germination [S. cerevisiae, YDR523C] 2e-04 

[FUNCAT] c energy conversion [M. genitalium, MG109] 3e-04 

[BLOCKS] BL00107A Protein kinases ATP-binding region proteins 

[BLOCKS] BL00939F 

[SCOP] dlgol 5.1.1.1.9 MAP kinase Erk2 [rat Rattus norvegicus 3e-62 

[SCOP] dlwfc 5.1.1-1.8 MAP kinase p38 [human (Homo sapiens) 5e-59 

[SCOP] dlkoa_2 5.1.1.1.7 (1-350) Twitchin, kinase domain [Caenorhabditi ie-75 

[SCOP] dlkoba_ 5.1.1.1.6 Twitchin, kinase domain [California sea har le-72 

[SCOP] dlphk 5.1.1.1.5 gamma- subunit of glycogen phosphorylase kinas 4e-65 

[SCOP] dlirk 5.1.1.2.4 insulin receptor [Human (Homo sapiens) 2e-56 

[SCOP] dlapme_ 5.1.1.1.4 cAMP-dependent PK, catalytic subunit [mouse (Mu 4e-71 

[SCOP] dlfgka_ 5.1.1.2.3 Fibroblast growth factor receptor 1 [human (Horn le-50 

[SCOP] dlydre_ 5.1.1.1.3 cAMP-dependent PK, catalytic subunit [bovine (Bo 3e-70 

[SCOP) dlfmk_3 5.1.1.2.2 (168-437) c-src tyrosine kinase [human (Horn 5e-49 

[SCOP] dlcdkb_ 5.1.1.1.2 cAMP-dependent PK, catalytic subunit [pig (Su 2e-72 

[SCOP] d2hcka3 5.1.1.2.1 (167-437) Haemopoetic cell kinase Hck [huma 5e-46 

(SCOP] dlcsn 5.1.1.1.11 Casein kinase-1, CK1 [Schizosaccharomyces pombe 9e-42 

(SCOP] dljsua_ 5.1.1.1.1 Cyclin-dependent PK [Human (Homo sapiens) le-56 

[SCOP] dlckia_ 5.1.1.1.10 Casein kinase-1, CK1 [rat (Rattus norvegicus) 9e-52 

[EC] 2.7.1.38 Phosphorylase kinase 3e-29 

[EC] 2.7.1.123 Ca2+/calmodulin-dependent protein kinase 8e-66 

[EC] 2.7.1.128 [Acetyl-CoA carboxylase] kinase 2e-17 

(EC] 2.7.1.117 Myosin-light-chain kinase 2e-38 

(EC] 2.7.1.109 [Hydroxymethylglutaryl-CoA reductase (NADPH) ] kinase 2e-17 

[EC] 2.7.1.37 Protein kinase 6e-28 

[PIRKW] phosphotransferase 8e-66 

[PIRKW] nucleus 2e-24 

[PIRKW] transferase 8e-30 

[PIRKW] calcium 2e-27 

[PIRKW] duplication 4e-19 

[PIRKW] tandem repeat 2e-31 

[PIRKW] phorbol ester binding le-16 

[PIRKW] zinc le-16 

[PIRKW] cell cycle control 2e-20 

[PIRKW] serine/threonine-specific protein kinase 8e-66 

[PIRKW] phospholipid binding le-16 

[PIRKW] autophosphorylation 8e-66 

[PIRKW] brain le-14 

[PIRKW] heterotetramer 2e-16 

[PIRKW] polymer 3e-29 

( PIRKW] mitosis 2e-20 

[PIRKW] magnesium 7e-22 

[PIRKW] ATP 8e-66 

[PIRKW] alternative initiators le-29 
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PIRKW] 


phosphoprotein 8e-66 


PIRKW J 


apoptosis 2e-31 


PIRKW J 


glycoprotein 4e-19 


PIRKW J 


skeletal muscle 3e-28 


PIRKW} 


protein kinase 2e-28 


PIRKW J 


testis 3e-28 


PIRKW J 


signal transduction le-21 


PIRKW J 


cAMP binding le-16 


PIRKW J 


purine nucleotide binding 5e-25 


PIRKW] 


structural protein 4e-19 


PIRKW J 


calcium binding 3e-45 


PIRKW) 


alternative splicing 3e-45 


PIRKW] 


p-loop 5e-25 


PIRKW] 


lipoprotein 2e-16 


PIRKW] 


cardiac muscle 4e-19 


PIRKW] 


muscle 3e-28 


PIRKW] 


myristylation 2e-16 


PIRKW] 


EF hand 5e-29 


PIRKW] 


cell division 2e-38 


PIRKW] 


calmodulin binding 8e-66 


PIRKW] 


smooth muscle 7e-31 


SUPFAM] 


fibronectin type III repeat homology 7e-31 


SUPFAM] 


immunoglobulin homology 7e-31 


SUPFAM] 


ribosomal protein S6 kinase II 3e-26 


SUPFAM] 


calcium-dependent protein kinase 5e-29 


SUPFAM] 


AMP-activated protein kinase 7e-22 


SUPFAM] 


protein kinase akt le-14 


SUPFAM] 


protein kinase SPK1 3e-20 


SUPFAM] 


unassigned Ser/Thr or Tyr-specific protein kinases 2e-36 


SUPFAM] 


Ca2+/calmodulin-dependent protein kinase 3e-45 


SUPFAM] 


calmodulin repeat homology 5e-29 


SUPFAM] 


protein kinase DUNl 2e-24 


SUPFAM] 


Dictyostelium cAMP-dependent protein kinase catalytic chain le-14 


SUPFAM] 


death-associated protein kinase 2e-31 


SUPFAM] 


myosin-light-chain kinase, nonmuscle le-29 


SUPFAM] 


pleckstrin repeat homology le-14 


SUPFAM] 


ankyrin repeat homology 2e-31 


SUPFAM] 


protein kinase homology 8e-66 


SUPFAM] 


Ca2+/calmodulin-dependent protein kinase II 8e-36 


SUPFAM] 


twitchin le-18 


SUPFAM] 


protein kinase C zinc-binding repeat homology le-16 


SUPFAM] 


titin 4e-19 


SUPFAM] 


protein kinase cdrl 2e-20 


SUPFAM] 


kinase-related transforming protein 2e-38 


SUPFAM] 


Ca2+/calmodulin-dependent protein kinase I 8e-66 


crtDcnM 1 
bUrrftMJ 


kinase interaction domain homology 2e— 24 


SUPFAM] 


protein kinase C mu le-16 


PROSITE] 


AM I DAT I ON 1 


PROSITE] 


MYRISTYL 3 


PROSITE] 


CK2 PHOSPHO SITE 10 


PROSITE] 


TYR PHOSPHO SITE 2 


PROSITE] 


PKC_PHOSPHO_SITE 11 


PFAM] 


Eukaryotic protein kinase domain 


KW] 


All Alpha 


KW] 


3D 


KW] 


LOW COMPLEXITY 7.40 % 



SEQ MPFGCVTLGDKKNYNQPSEVTDRYDLGQVIKTEEFCEIFRAKDKTTGKLHTCKKFQKRDG 

SEG 

Ia06- CEETTTGGGCEEEEEECBCGGGGGEEEEEETTTTCEEEEEEEEC 

SEQ RKVRKAAKNEIGI LKMVKH PNI LQLVDVFVTRKEYFI FLELATGREV FDWI LDQGY YSER 

SEG 

Ia06- HHHHHHHHHCCTTTBCCEEEEEEETTEEEEEECCCCCEEHHHHHHHTTTTBHH 

SEQ DTSNVVRQVLEAVAYLHSLKIVHRNLKLENLVYYNRLKNSKIVISDFHLAKLENGLIKEP 

SEG 

Ia06- HHHHHHHHHHHHHHHHHHHCCCTTTTTTTTEEECCCTTTTCEEECCCTTTTCHHHHHCCC 

SEQ CGTPEYLGNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVTRLME 

SEG 

Ia06- HHHHHHHCCTTTTTT THHHHHHHHHCCCCCCTTTTTTTTCHHHHHHHHHHCT 

SEQ VEQDQRITAEEAISHEWISGNAASDKNIKDGVCAQIEKNFARAKWKKAVRVTTLMKRLRA 

SEG 

Ia06- TTGGGCCCHHHHHHTTTTTTCCCCCCBHHHHHHHHHHHHHCCTTTTTTBTHHHHHHHC . . 

SEQ PEQSSTAAAQSASATDTATPGAAGGATAAAASGATSAPEGDAARAAKSDNVAPADRSATP 

SEG . . xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

Ia06- 
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SEQ ATDGSATPATDGSVTPATDGSITPATDGSVTPATDRSATPATDGRATPATEESTVPTTQS 

SEG 

Ia06- 

SEQ S AM L AT KAAAT P E PAMAQ P DS T AP EG AT GQA P P S S KGEEAAG Y AQESQRE E A S 

SEG 



Prosite for DKFZphfbr2_82e4 . 1 



PS00005 


21->24 


PKC PHOSPHO 


SITE 


n T"\r\/** nn Arte 
PDOCUUUU3 


PS00005 


46->49 


PKC~PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


51->54 


PKC~PHOSPHO" 


SITE 


PDOC00005 


PS00005 


91->94 


PKC PHOSPHO" 


"site 


PDOCUUUUD 


PS00005 


103->106 


PKC PHOSPHO" 


'site 


PDOC00005 


PS00005 


118->121 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


138->141 


PKC PHOSPHORS ITE 


PDOC00005 


PS00005 


264->267 


PKC PHOSPHO 


site 


PDOC00005 


PS00005 


394->397 


PKC~"PHOSPHO 


"site 


PDOC00005 


PS00005 


454->457 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


467->470 


PKC PHOSPHO~SITE 


PDOC00005 


PS00006 


7->ll 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


91->95 


CK2*"PHOSPHO~ 


"site 


PDOC00006 


PS00006 


103->107 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


118->122 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


248->252 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


313->317 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


336->340 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


442->446 


CK2 PHOSPHO* 


"site 


PDOC00006 


PS00006 


455->459 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


467->47X 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


456->464 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


127->136 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00008 


260->266 


MYRISTYL 




PDOC00008 


PS00008 


321->327 


MYRISTYL 




PDOC00008 


PS00008 


324->330 


MYRISTYL 




PDOC00008 


PS00009 


59->63 


AMIDATION 




PDOC00009 



Pfam for DKFZphfbr2_82e4 . 1 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Eukaryotic protein kinase domain 



24 



*YeigRiIGeGsFGtVYkCiWr.TGeIVAlKIIkkrsms FlREIq 

Y +G++I F ++++++++ TG++ K++ KR+ + +EI 

YDLGQVIKTEEFCEIFRAKDKTTGKLHTCKKFQKRDGRKVRKAAKNEIG 



qMnn Ye rM 1 1 fCGTPWY* 
+ N ++ + CGTP+Y 
172 LEN — GLI KEPCGTPEY 



186 



188 



*GepPFyd dnMemlmrliqrf rrpfWpnCSeElyDFMr 

G PPFY+ + +++I++++++F +P+W+ +S ++D+++ 

GNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVT 



wCWnyDPe kRPTFrQI LnHPWF* 
+++++ ++R+T+++++ H W+ 
237 RLMEVEQDQRITAEEAI SHEWI 258 



72 



IMRrLnHPNIIRFYDwFedddDHI YMIMEYMeGGDLFDYI rrngpMsEwe 
I+++++HPNI+++ D+F + +++ + +E++ G + FD+I ++G++SE++ 
73 ILKMVKHPNILQLVDVFV-TRKEYFIFLELATGREVFDWILDQGYYSERD 121 

I r f IMyQI LrGMeYLHSMgl IHRDLKPENILI DeN . . . gqIKIcDFGLAR 
++++Q+L++++YLHS +I+HR LK EN+ + ++ I I+DF LA+ 

122 TSNWRQVLEAVAYLHSLKIVHRNLKLENLVYYNRLKNSKIVISDFHLAK 171 



236 
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DKFEphfbr2_82gl4 



group: transmembrane protein 

DKFZphfbr2_82gl4 encodes a novel 208 amino acid proline-rich protein without similarity to 
known proteins . 

The protein contains one transmembrane domain. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



unknown prolin rich protein 
membrane regions: 1 

Summary DKFZphfbr2_82gl4 encodes a novel 208 amino acid protein. 



unknown prolin rich protein 

complete cDNA, complete cds, EST hits 
TRANSMEMBRANE 1 

Sequenced by DKFZ 

Locus: /map* n 26.2 cR from top of Chrl6 linkage group" 
Insert length: 2059 bp 

Poly A stretch at pos. 2049, polyadenylation signal at pos. 2024 



1 AGAAGTGCGA CTGCCAGCTG CCGAGGCGTT CGGTCCTGCT GTTGCGGCCG 
51 CTGCCCCAGG GCTGCGGGGA CGCTCCCGGA GCCCTGCCTG TCCCCTGTCC 
101 ATCCAGGCCA GCAGCTGAAG GAGCCTCACC TGCCTCCCTT CTCTGAGTAG 
151 CACGGATTTG AGGAGAAGCA GCGAAGATGT CCAGCGAGCC TCCCCCTCCT 
201 TATCCTGGGG GCCCCACAGC CCCACTTCTG GAAGAGAAAA GTGGAGCCCC 
251 GCCCACCCCA GGCCGTTCCT CCCCAGCTGT GATGCAGCCC CCTCCAGGCA 
301 TGCCACTGCC CCCTGCGGAC ATTGGCCCCC CACCCTATGA GCCGCCGGGT 
351 CACCCAATGC CCCAGCCTGG CTTCATCCCA CCACACATGA GTGCAGATGG 
401 CACCTACATG CCTCCGGGTT TCTACCCTCC TCCAGGCCCC CACCCACCCA 
4 51 TGGGCTACTA CCCCCCAGGG CCCTACACGC CAGGGCCCTA CCCTGGCCCT 
501 GGGGGCCACA CAGCCACAGT CCTGGTCCCT TCAGGAGCTG CCACCACGGT 
551 GACAGTGCTG CAGGGAGAGA TCTTTGAGGG AGCGCCTGTG CAGACGGTGT 
601 GTCCCCACTG CCAGCAGGCC ATCGCCACCA AGATCTCCTA CGAGATTGGC 
651 TTGATGAATT TCGTGCTGGG TTTCTTCTGT TGCTTCATGG GATGTGATCT 
701 GGGCTGCTGC CTGATCCCCT GCCTCATCAA TGACTTCAAG GATGTGACGC 
751 ACACATGCCC CAGCTGCAAA GCCTACATCT ACACGTACAA GCGCCTGTGC 
801 TAACGGAGCT GGGACTCGGG ACTCCCCCGC CTGTCAGTCT GGCCCCCTGT 
851 GCTTTGCTCC CTGCGCTCAG TGGTCACTTT CCCGCTCCCA CTTGGGGCTG 
901 GGAGCCGTGC CACCATCCCC TAGAAGTCCT GTCCTCTTCA CCCTGCCCTA 
951 CCTGAGCCGC TGACTCTTCT GGCAAAAATT CTGTTGGGAT TTAAGGCCAA 
1001 GGGTCAGTGG GTGGCAGGGG GCTGGCAATG AGCTTGTGTG TTGTTGGTCT 
1051 GCTTGGTGTG TGTGATCGGG AAGATAAGCT GGGAGGGGTC TCGTGCTGGG 
1101 GTCCTGATGC CTCTGTTTCC AAACAAGGTA CAGGTTCAGT CCAGACTCTT 
1151 TCCCCCTGGG ACCAACAGCA GCCAGAGCAG TTAGCCAGTT AGTCCCCAGG 
1201 CCTGTGGCCA CAGGCGTTTC TGACCTGCTG GGCCGAGAAT GGGTAAGTTG 
1251 TCTGGAGTCA GGTGGGCCCA CGTAGGACAG GGTCACAAAG CCTGGGTTTG 
1301 TTTCTGGGTA CTTTGCGCCT CTGGGGTGCT AGAGGTGGGG CATGGTGGCT 
1351 GGAAGTAAAA CTGCCAACTC TGGCCCTCAG AACTCTCAGG TATAGAAGCC 
1401 CAGGATGTCT AATACCCTGT CCCAGTGCCC GAGAGCTGCC TGGTGTCAGG 
1451 TAGAGAGGAC ACTGTACCTG GGTGAATGAT CAGACCCTGG TAGCTAAGAA 
1501 GGAACTTGTC CCTTTGAGTC AGTGTGCAGA CCCCCTTTCA GGCCATGCCT 
1551 CTGTGAACCC TGTATTGCTG GGGCCGGAAG GAGCCCCTGA GCCTAGCCCC 
1601 TTCCCGTCTG CCCTGTGTCC TCACTGCGTG TGGGTATGAC CTCTGCCTGG 
1651 TGGCTGGTGT ATCCCAACTG GGCAAGAGAT GGCAGAGGGT CCCCCTTGTG 
1701 GGTGCGCTTG GATGTGCAGA GCCTTCTCCA TGGATTTTCT TCCCTGTAAG 
1751 TGCCGGGCCC CCCACCCCAG CTGACAGGCT GTTGCTGTGC CTGCTCACAC 
1801 CTGCTCCTGC AGGCACACTG GGCTAGGGAC GAGGAAGGAG CAGCCACAAG 
1851 TGGTAGAACT GCCTTGGTGG ACACCAGCCT CGCCCTGTCT TTATTTCCTG 
1901 AATGGTTTGT GAACTTGCTC ACCTGGACCA CTGTATCCTG CCACTGTCCT 
1951 TCCTGGTCTC GCACTGCCAC TGCATGGCCT CCTGTCACTG TGAATCGTGG 
2001 CCCAGTCTCA GTTTGTAGTT TCTCATTAAA TTGGCCCTTT CACTCCCCCA 
2051 AAAAAAAAA 



BLAST Results 
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Entry HS727347 from database EMBL: 
human STS WI-16589. 
Length « 275 
Plus Strand HSPs: 

Score « 1365 (204.8 bits), Expect - 3.0e-55, P = 3.0e-55 

Identities * 275/276 (99%), Positives - 275/276 (99%), Strand « Plus / 

PI 



Medline entries 



No Medline entry 



Peptide information for frame 3 



1 MSSEPPPPYP GGPTAPLLEE KSGAPPTPGR SSPAVMQPPP GMPLPPADIG 

51 PPPYEPPGHP MPQPGFIPPH MSADGTYMPP GFYPPPGPHP PMGYYPPGPY 

101 TPGPYPGPGG HTATVLVPSG AATTVTVLQG EIFEGAPVQT VCPHCQQAIA 

151 TKISYEIGLM NFVLGFFCCF MGCDLGCCLI PCLINDFKDV THTCPSCKAY 

201 IYTYKRLC 

ORF from 177 bp to 800 bp; peptide length: 208 
Category: similarity to known protein 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_82gl4 , frame 3 

PIR:S57447 HPBRII-7 protein - human, N = 1, Score = 206, P = 8.4e-16 

PIR:A47655 spliceosome-associated protein SAP 62 - human, N - 1, Score 
= 198, P = 4.3e-15 



>PIR:S57447 HPBRII-7 protein - human 
Length » 551 

HSPs: 



Score =• 206 (30.9 bits), Expect - 8,4e-16, P = 8.4e-16 
Identities = 57/115 (49%), Positives = 62/115 (53%) 



Query: 


5 


PPPPYPGGPTAPLLEEKSGAPPTPGRSSPAVMQPPPGMPLPPADIGPP PYEP 


56 




PPPP+P G T P G P PG P PPPG LPP GPP P P 




Sbjct: 


226 


PPPPFPAGQTPP--RPPLGPPGPPGPPGP PPPGQVLPPPLAGPPNRGDRPPPPVLF 


279 


Query: 


57 


PGHPMPQP — GFIPPHMSADGTYMP-PGFYPPPGPHPPM-GYYPP-GPYTPGPYPGPGGH 


111 




PG P QP G +PP G P PG+ PPPGP PP G PP GP+ P P PGP G 




Sbjct: 


280 


PGQPFGQPPLGPLPP GPPPPVPGYGPPPGPPPPQQGPPPPPGPFPPRP-PGPLGP 


333 


Query: 


112 


TATVLVP 118 








T+ P 




Sbjct: 


334 


PLTLAPP 340 




Score 


- 177 


(26.6 bits), Expect * l.le-12, P = l.le-12 




Identities = 55/120 (45%), Positives - 61/120 (50%) 




Query: 


5 


PPPPYPGGPTAP--LLEEKSGAPPTPG-RSSPAVM QP PPGMPLPPADIGPPPYE 


55 




P PP P GP P +L PP G R P V+ QP PP PLPP GPPP 




Sbjct: 


244 


PGPPGPPGPPPPGQVLPPPLAGPPNRGDRPPPPVLFPGQPFGQPPLGPLPP— GPPP-P 


299 


Query: 


56 


PPGHPMPQPGFIPPHMSADGTYMPPGFYPP — PGP-HP PMGYYPPGPYTPGPYPG PG 


109 






PG+ P PG PP G PPG +PP PGP PP+ PP P+ PGP PG P 




Sbjct: 


300 


VPGYG-PPPGPPPPQQ GPPPPPGPFPPRPPGPLGPPLTLAPP-PHLPGPPPGAPPPA 


354 


Query: 


110 


GHTATVLVP 118 




Sbjct: 


355 


H P 
PHVNPAFFP 363 




Score 


= 168 


(25.2 bits), Expect - l.le-11, P - l.le-11 




Identities - 47/118 (39%), Positives = 51/118 (43%) 




Query: 


5 


PPPPYPG-GPTAPLLEEKSGAPPTPGRSSPAVMQP— PPGMPLPPADI-GPPPYEPPGHP 


60 
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Sbjct: 


296 


Query: 


61 


Sbjct: 


356 


Query: 


121 


Sbjct: 


411 


Score 


= 156 


Identities • 


Query: 


6 


Sbjct: 


208 


Query: 


65 


Sbjct: 


263 


Score 


= 121 


Identities : 


Query: 


23 


Sbjct: 


213 


Query: 


79 


Sbjct: 


266 



PPPP PG GP + G PP PG P P PP PP + GPPP PP P 



P F PP ++ MP P P P G PP PY G Y PG 



(23.4 bits), Expect = 2-le-10, P - 2.1e-10 
44/103 (42%), Positives = 50/103 (48%) 



PGG P G PP P +P +PP G P PP GPPP PG +P P 
/PGGDRFPGPAGPGGPPPPFPAGQTPP — RPPLGPPGPPGPPGPPP PGQVLPPP 262 



PP+ D PP +P P PP+G PPGP P GP PGP 



(18.2 bits). Expect - 5.2e-05, P = 5.2e-05 
= 40/90 (44%), Positives - 45/90 (50%) 



PG + P PP P PP +GPP P PPG P P PG +PP ++ 
ETGPAGPGGPPPPFPAGQTPPRPPLGPPGPPGPPG-P-PPPGQVLPPPLAG 265 



PP G PPP P P G P GP PGP P PG 



Pedant information for DKFZphfbr2_82gl4, frame 3 



Report for DKF2phfbr2_82gl4 . 3 



[LENGTH] 208 

[MW] 21862.47 

[pi] 5.55 

[PROSITE] MYRISTYL • 3 

[PROSITE] PKC_PHOSPHO_SITE 2 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 39.90 



SEQ MSSEPPPPYPGGPTAPLLEEKSGAPPTPGRSSPAVMQPPPGMPLPPADIGPPPYEPPGHP 

SEG . . . .xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccchhhhhhccccccccccccccccccccccccccccccccccccccc 

MEM 

SEQ MPQPGFIPPHMSADGTYMPPGFYPPPGPHPPMGYYPPGPYTPGPYPGPGGHTATVLVPSG 

SEG xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccceeeeecccc 

MEM 

SEQ AATTVTVLQGEIFEGAPVQTVCPHCQQAIATKISYEIGLMNFVLGFFCC FMGCDLGCCLI 

SEG 

PRD cceeeeeeeeeeecccceeeeccchhhhhhhhhhhhhhhceeeeeeeeeecccccceeec 

MEM MMMMMMMMMMMMM 

SEQ PCLINDFKDVTHTCPSCKAYIYTYKRLC 

SEG '. . 

PRD eeeecccccccccccccceeeeeeeccc 

MEM MMMM 



Prosite for DKFZphfbr2_82gl4 .3 



PS00005 
PS00005 
PS00008 
PS00008 
PS00008 



196->199 
203->206 
109->115 
120->126 
172->178 



PKC_PHOSPHO_SITE 
PKC_PHOSPHO SITE 
MYRISTYL 
MYRISTYL 
MYRISTYL 



PDOC00005 
PDOC00005 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphfbr2_82gl4 . 3) 
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DKFZphfbr2_82il7 



group: signal transduction 

DKFZphtes2_82il7 encodes a novel 334 amino acid protein with similarity to the plasma membrane 
substrate for the cAMP-dependent protein kinase. 

The novel protein is a transmembrane protein with strong similarity to the phospholemman 
protein, a membrane substrate for the cAMP-dependent protein kinase. It seems to serve as a 
chloride channel or as a chloride-channel regulator. 

The new protein can find application in modulating/blocking cAMP-dependent protein kinase- 
dependent pathways. 



similarity to plasma membrane substrate for cAMP-dependent protein kinase 
complete cDNA, complete cds, EST hits 

potential start at Bp 31 matches Kozak consensus PyNNatgG 
might be a SODIUM/ POTASSIUM-TRANSPORTING AT PAS E 
TRANSMEMBRANE 1 

Sequenced by DKFZ 

Locus: /map-"ll; 920_E_12; 786_ (A, H)_U; (797 , 802 )_ (E, H) _7" 
Insert length: 1647 bp 

Poly A stretch at pos. 1637, polyadenylation signal at pos. 1615 



1 AGTCTCGGAG GGGACCGGCT 
51 TCCTCTGCAG CCTGCTGGCC 
101 GAGAAGGAAA TGGACCCTTT 
151 GGGACTGGTG TTCGCTGTGG 
201 TAAGTCGCAG GTGCAAGTGC 
251 GATGAGGAAG CCCAGGTGGA 
301 CCAGAAAGCA GAGAACTGAA 
351 CTGAGGCGGC TGCTTGAACC 
401 CCGGCCACTT CAGCAACAGC 
451 TGTCCCCCAC CCTATCCCCT 
501 CTAACACTTG CCTCCCCGCT 
551 TGTGTGTGTG TGTGTGTGTG 
601 TCTTTGTGGC TACTTGTTTG 
651 GGACTCGCTT TCCCAGGCAG 
701 CTGCCCCCGT GGCCCTCCAT 
751 CCCGAGACCA GCCCCCTCCC 
801 GGGCAGTGGT CTTCAGTCGT 
851 GTCATCATTC TTCATGGACT 
901 CCTTATCCCA CCTGATCCCA 
951 AAAGCAAGGA GCTGGTGAGC 
1001 TCCGTGGTTA ATTTCTTCCC 
1051 CCGCCCCTTC ACAGAGCGCC 
1101 GCCCCTGGGG AATGTGTCCC 
1151 GCTCTGGGAC CCTACCCCTT 
1201 TACAGCCCAG CTCATCCAGA 
1251 TGGCAGGCAA TAGTTGAAGG 
1301 GGATGGATGG AGGGAGAGCA 
1351 TAGATGGGCA GCAGAGGCAA 
1401 GTCAGAGCGG TGAGCGAGGT 
1451 CCTTGGGAAC AGTGAGAGGT 
1501 CCAGATCCCG CCCCTCCTGT 
1551 CGTGCGCTGT GACCCATTGC 
1601 ACAACAGAAA AAAGGAATAA 



GTGCAGACGC CATGGAGTTG GTGCTGGTCT 
CCCATGGTCC TGGCCAGTGC AGCTGAAAAG 
TCATTATGAT TACCAGACCC TGAGGATTGG 
TTCTCTTCTC GGTTGGGATC CTCCTTATCC 
AGTTTCAATC AGAAGCCCCG GGCCCCAGGA 
GAACCTCATC ACCGCCAATG CAACAGAGCC 
GTGCAGCCAT CAGGTGGAAG CCTCTGGAAC 
TTTGGATGCA AATGTCGATG CTTAAGAAAA 
CCTTTCCCCA GGAGAAGCCA AGAACTTGTG 
CTAACACCAT TCCTCCACCT GATGATGCAA 
GCAGCCTGTG GTCCTGCCCA CCTCCCGTGA 
TGTGTGACTG TGTGTGTTTG CTAACTGTGG 
TGGATGGTAT TGTGTTTGTT AGTGAACTGT 
GGGCTGAGCC ACACGGCCAT CTGCTCCTCC 
CACCTTCTGC TCCTAGGAGG CTGCTTGTTG 
CTGATTTAGG GATGCGTAGG GTAAGAGCAC 
CTTGGGACCT GGGAAGGTTT GCAGCACTTT 
CCTTTCACTC CTTTAACAAA AACCTTGCTT 
GTCTGAAGGT CTCTTAGCAA CTGGAGATAC 
CCAGCGTTGA CGTCAGGCAG GCTATGCCCT 
AGGGGCTTCC ACGAGGAGTC CCCATCTGCC 
CGGGGATTCC AGGCCCAGGG CTTCTACTCT 
CTGCATATCT TCTCAGCAAT AACTCCATGG 
CCAACCTTCC CTGCTTCTGA GACTTCAATC 
TGCAGACTAC AGTCCCTGCA ATTGGGTCTC 
ACTTCCTGTT CCGTTGGGGC CAGCACACCG 
GAGGCCTTTG CTTCTCTGCC TACGTCCCCT 
CTCCCGCATC CTTTGCTCTG CCTGTCAGTG 
GGGTTGGAGA CTCAGCAGGC TCCGTGCAGC 
TGAAGGTCAT AACGAGAGTG GGAACTCAAC 
CCTCTGTGTT CCCGCGGAAA CCAACCAAAC 
TGTTCTCTGT ATCGTGACCT ATCCTCAACA 
AATATCCTTT GTTTCCTAAA AAAAAAA 



BLAST Results 



Entry HS31455 from database EMBL: 
human STS WI-2739. 
Length = 103 
Minus Strand HSPs: 

Score = 487 (73.1 bits), Expect = 4.4e-14, P - 4.4e-14 

Identities « 101/104 (97%), Positives - 101/104 (97%), Strand = Minus / 

Plus 

frame shift in primer binding site 
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Medline entries 



91250422: 

Purification and complete sequence determination of the major plasma 
membrane substrate 

for cAMP-dependent protein kinase and protein kinase C in myocardium. 
95091702: 

Protein kinase C and cyclic AMP-dependent protein kinase phosphorylate 
phospholemman, 

an insulin and adrenaline-regulated membrane phosphoprotein, at 
specific sites in the 

carboxy terminal domain. 

95138184: 

Mat-8, a novel phospholemman-like protein expressed in human breast 
tumors, induces a 

chloride conductance in Xenopus oocytes. 



Peptide information for frame 2 

1 MELVLVFLCS LLAPMVLASA AEKEKEMDPF HYDYQTLRIG GLVFAWLFS 
51 VGILLILSRR CKCSFNQKPR APGDEEAQVE NLITANATEP QKAEN 

ORF from 32 bp to 316 bp; peptide length: 95 
Category: strong similarity to known protein 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_82il7 , frame 2 

SWISSPROT:PLM HUMAN PHOSPHOLEMMAN PRECURSOR. , N - 1, Score « 196, P « 
1.2e-15 

TREMBL : AFO 9139 0_1 product: "phospholemman precursor"; Mus musculus 
phospholemman precursor, gene, complete cds., N = 1, Score = 187, P = 
l.le-14 

PIR:A40533 cAMP-dependent protein kinase major membrane substrate 
•precursor - dog, N - 1, Score - 189, P = 6.5e-15 

SWISSPROT:PLM_RAT PHOSPHOLEMMAN PRECURSOR. , N = 1, Score = 185, P « 
1.7e-14 

>SWISSPROT:PLM_HUMAN PHOSPHOLEMMAN PRECURSOR. 
Length = 92 

HSPs: 

Score « 196 (29.4 bits), Expect = 1.2e-15, P = 1.2e-15 
Identities =* 43/85 (50%), Positives = 56/85 (65%) 

Query: 4 VLVFLCSLLAPMVLASAAEKEKEMDPFHYDYQTLRIGGLVFAWLFSVGILLILSRRCKC 63 

+LVF LL + AE KE DPF YDYQ+L+IGGLV A +LF +GIL++LSRRC+C 

Sbjct: 7 ILVFCVGLLT MAKAESPKEHDPFTYDYQSLQIGGLVIAGILFILGILIVLSRRCRC 62 

Query: 64 SFNQKPRA- - PGDEEAQVENLI T ANAT 88 

FNQ+ R P +EE +1 +T 
Sbjct: 63 KFNQQQRTGEPDEEEGTFRSSIRRLST 89 

Pedant information for DKFZphfbr2_82il7, frame 2 

Report for DKFZphfbr2_82il7 . 2 

[LENGTH] 95 

[MW] 10542.3.7 

[pi] 5.05 

[HOMOL] SWISSPROT:PLM_HUMAN PHOSPHOLEMMAN PRECURSOR. 3e-15 

[BLOCKS] BL01310 
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[EC] 


3.6.1.37 Na+/K+-exchanging ATPase 6e-08 


[PIRKW] 


transmembrane protein 


le-09 


[PIRKW] 


hydrolase 6e-08 




[PROSITE] 


ATP1G1 PLM MAT8 


1 


(PROSITE) 


MYRISTYL 1 




[PROSITE) 


CK2 PHOSPHO SITE 


1 


[PROSITE J 


TYR PHOSPHO SITE 


1 


[PROSITE] 


PKC PHOSPHO SITE 


2 


[PROSITE] 


ASN_GLYCOSYLATION 


1 


[KW] 


Alpha Beta 




[KW] 


SIGNAL_PEPTIDE 19 





SEQ 
PRD 



MELVLVFLCSLLAPMVLASAAEKEKEMDPFHYDYQTLRIGGLVFAVVLFSVGILLILSRR 
ccchhhhhhhhhhccccccccccccccccccceeeeecccceeeehhhhhhheeeeehhh 



SEQ 
PRD 



CKCSFNQKPRAPGDEEAQVENLITANATEPQKAEN 
hhhcccccccccccchhhhhhhhhhhccccccccc 



Prosite for DKF2phfbr2_82il7 .2 



PS00001 


86- 


■>90 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


36- 


->39 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


58- 


■>61 


PKC~PHOSPHO SITE 


PDOC00005 


PS00006 


19- 


->23 


CK2~PHOSPHO SITE 


PDOC00006 


PS00007 


25- 


■>33 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


41- 


■>47 


MYRISTYL 


PDOC00008 


PS01310 


28- 


->42 


ATP1G1 PLM MAT 8 


PDOC01014 



(No Pfara data available for DKFZphfbr2_82il7 .2) 
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DKFZphfbr2_82i24 



group: nucleic acid management 

DKFZphfbr2_82i24 encodes a novel 547 amino acid protein with similarity to DEAD-box 
super family ATP-dependent helicases. 

RNA helicases comprise a large family of proteins that are involved in basic biological 
systems such as nuclear and mitochondrial splicing processes, RNA editing, rRNA processing, 
translation initiation, nuclear mRNA export, and mRNA degradation. RNA helicases are essential 
factors in cell development and differentiation, and some of them play a role in transcription 
and replication of viral single-stranded RNA genomes. The members of the largest subgroup, the 
DEAD and DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis . 

The novel protein contains a DEAD-box an ATP/GTP-binding site motif A {P-loop, interacting 
with one of the phophate groups of the nucleotide) and a leucine zipper. Mutations in the 
closely related Drosophila Hlc gene result in lethality in horaozygotes. Therefore the new 
protein seems to be critical involved in RNA processing in eukariontic c ells. 

The new protein can find application in modulating RNA metabolism and gene expression. 



strong similarity to DEAD-box subfamily ATP-dependent helicase 
complete cDNA, complete cds, EST hits 

potential Start at Bp 9 matches Kozak consensus PyNNatgG, 
[PFAM] Helicases conserved C-terminal domain 
[PFAM] DEAD and DEAH box helicases 

Sequenced by DKFZ 

Locus: /map=*"720_A_3; 758_H_4; 772_E_3; 804_A_5; 175.5 cR from topFT of Chr7 linkage group" 
Insert length: 1860 bp 

Poly A stretch at pos. 1850, polyadenylation signal at pos. 1829 



1 AGCAGCGCCA TGGAGGACTC 
51 CGATCCCCGG CTCCTTCAGG 
101 CGCTGATCCA GGAGAAGGCC 
151 CTGGCTCGGG CCCGCACGGG 
201 GATGCTGCAG CTGTTGCTCC 
251 AGGCAGTGAG AGGCCTTGTT 
301 GCACAGTCCA TGATTCAGCA 
351 AGTGGCCAAT GTCTCAGCTG 
401 TGATGGAGAA GCCAGATGTG 
451 CACTTGCAGC AAGACAGCCT 
501 GGTGGACGAA GCTGACCTTC 
551 AGAGTCTCCT CTGTCACTTG 
601 GCTACTTTTA ACGAGGACGT 
651 CCCGGTTACC CTTAAGTTAC 
701 TACAGCAGTT TCAGGTGGTC 
751 CTGTATGCCC TGCTCAAGCT 
801 TGTCAACACT CTAGAACGGA 
851 TCAGCATCCC CACCTGTGTG 
901 TGCCACATCA TCTCACAGTT 
951 AACTGATGCT GAAGTCCTGG 
1001 GAGGGCCCAA AGGGGACAAG 
1051 GGCATAGACT TCCACCATGT 
1101 AACCCCTGAG GCCTACATCC 
1151 ACCCAGGCAT AGTCTTAACC 
1201 GGCAAGATTG AGGAGCTTCT 
1251 CCCCTACCAG TTCCGGATGG 
1301 GGGATGCCAT GCGCTCAGTG 
1351 AAGGAGATCA AGGAAGAGCT 
1401 TGAAGACAAC CCTAGGGACC 
1451 ACCCCGCAGT GGTGAAGCCC 
1501 CCTCCTGCTC TCCGTGGCCT 
1551 GTCTTCCTCT TGTAGGAAGG 
1601 GCAGCTTCAA GCACAAAGGA 
1651 TGAGGTTGTT GGGCCTCTCT 
1701 ACACCCTTCG TGGACAGGCG 
1751 AGACAGTTCT GGGGCCGGCA 
1801 CAAGCTGGCA TCTTGCCCCT 
1851 AAAAAAAAAA 



TGAAGCACTG GGCTTCGAAC ACATGGGCCT 
CTGTCACCGA TCTGGGCTGG TCGCGACCTA 
ATCCCACTGG CCCTAGAAGG GAAGGACCTC 
CTCCGGGAAG ACGGCCGCTT ATGCTATTCC 
ATAGGAAGGC GACAGGTCCG GTGGTAGAAC 
CTTGTTCCTA CCAAGGAGCT GGCACGGCAA 
GCTGGCTACC TACTGTGCTC GGGATGTCCG 
CTGAAGACTC AGTCTCTCAG AGAGCTGTGC 
GTAGTAGGGA CCCCATCTCG CATATTAAGC 
GAAACTTCGT GACTCCCTGG AGCTTTTGGT 
TTTTTTCCTT TGGCTTTGAA GAAGAGCTCA 
CCCCGGATTT ACCAGGCTTT TCTCATGTCA 
ACAAGCACTC AAGGAGCTGA TATTACATAA 
AGGAGTCCCA GCTGCCTGGG CCAGACCAGT 
TGTGAGACTG AGGAAGACAA ATTCCTCCTG 
GTCATTGATT CGGGGCAAGT CTCTGCTCTT 
GTTACCGGCT ACGCCTGTTC TTGGAACAGT 
CTCAATGGAG AGCTTCCACT GCGCTCCAGG 
CAACCAAGGC TTCTACGACT GTGTCATAGC 
GGGCCCCAGT CAAGGGCAAG CGTCGGGGCC 
GCCTCTGATC CGGAAGCAGG TGTGGCCCGG 
GTCTGCTGTG CTCAACTTTG ATCTTCCCCC 
ATCGAGCTGG CAGGACAGCA CGCGCTAACA 
TTTGTGCTTC CCACGGAGCA GTTCCACTTA 
CAGTGGAGAG AACAGGGGCC CCATTCTGCT 
AGGAGATCGA GGGCTTCCGC TATCGCTGCA 
ACTAAGCAGG CCATTCGGGA GGCAAGATTG 
TCTGCATTCT GAGAAGCTTA AGACATACTT 
TCCAGCTGCT GCGGCATGAC CTACCTTTGC 
CACCTGGGCC ATGTTCCTGA CTACCTGGTT 
GGTACGCCCT CACAAGAAGC GGAAGAAGCT 
CCAAGAGAGC AAAGTCCCAG AACCCACTGC 
AAGAAATTCA GACCCACAGC CAAGCCCTCC 
GGAGCTGAGC ACATTGTGGA GCACAGGCTT 
AGGCTCTGGT GCTTACTGCA CAGCCTGAAC 
GTGCTGGGCC CTTTAGCTCC TTGGCACTTC 
TGACAACAGA ATAAAAATTT TAGCTGCCCC 



BLAST Results 
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Entry HSG05793 from database EMBL: 
human STS WI-6581. 
Length = 206 
Minus Strand HSPs: 

Score « 992 {148.8 bits), Expect = 6.0e-38, P = 6.0e-38 

Identities = 204/208 (98%), Positives - 204/208 (98%), Strand - Minus / 

PI 

Entry AC004938 from database EMBL: 

Homo sapiens clone DJ0971C03; HTGS phase 1, 18 unordered pieces. 
Score 1269, P =» 6.5e-202, identities = 269/282 
12 exons Bp -87920-93706 (matching 1-1497) 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 10 bp to 1650 bp; peptide length: 547 
Category: strong similarity to known protein 
Classification: Nucleic acid management 
Prosite motifs: AT P_GTP_A (51-59) 
LEUCINE_ZIPPER (149-171) 



1 MEDSEALGFE HMGLDPRLLQ AVTDLGWSRP TLIQEKAIPL ALEGKDLLAR 

51 ARTGSGKTAA YAIPMLQLLL HRKATGPVVE QAVRGLVLVP TKELARQAQS 

101 MIQQLATYCA RDVRVANVSA AEDSVSQRAV LMEKPDVVVG TPSRILSHLQ 

151 QDSLKLRDSL ELLWDEADL LFSFGFEEEL KSLLCHLPRI YQAFLMSATF 

201 NEDVQALKEL ILHNPVTLKL QESQLPGPDQ LQQFQWCET EEDKFLLLYA 

251 LLKLSLIRGK SLLFVNTLER SYRLRLFLEQ FSIPTCVLNG ELPLRSRCHI 

301 ISQFNQGFYD CVIATDAEVL GAPVKGKRRG RGPKGDKASD PEAGVARGID 

351 FHHVSAVLNF DLPPTPEAYI HRAGRTARAN NPGIVLTFVL PTEQFHLGKI 

401 EELLSGENRG PILLPYQFRM EEIEGFRYRC RDAMRSVTKQ AIREARLKEI 

451 KEELLHSEKL KTYFEDNPRD LQLLRHDLPL HPAVVKPHLG HVPDYLVPPA 

501 LRGLVRPHKK RKKLSSSCRK AKRAKSQNPL RS FKHKGKKF RPTAKPS 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_82i24, frame 1 

TREMBL:AF017777_10 gene: "hlc"; product: "helicase"; Drosophila 
melanogaster tweety (tty), flightless (fli), dodo (dod) , penguin (pen), 
small optic lobes (sol), innocent bystander (iby), waclaw (waw) , bobby 
sox (bbx), sluggish (slg), helicase (hlc), misato (mst), and la costa 
(lcs) genes, complete cds., N = 1, Score = 1230, P = 3.2e-125 

TREMBL:SPCC14 94_6 gene: M SPCC1494 . 06c M ; product: "atp dependent 
helicase"; S.pombe chromosome II cosmid C1494., N « 2, Score = 753, P = 
2.5e-113 

PIR:S51412 hypothetical protein YLR276C - yeast (Saccharomyces 
cerevisiae), N « 2, Score = 711, P = 8.2e-117 

TREMBL : AF0 2 5 4 5 1_2 gene: "C24H12.4"; Caenorhabditis elegans cosmid 
C24H12., N - 2, Score = 564, P » 2.7e-9.9 



>TREMBL:AF017777_10 gene: "hlc"; product: "helicase"; Drosophila 

melanogaster tweety (tty), flightless £fli), dodo (dod), penguin (pen), 
small optic lobes (sol), innocent bystander (iby), waclaw (waw), bobby sox 
(bbx), sluggish (slg), helicase (hlc), misato (mst), and la costa (lcs) 
genes, complete cds. 
Length =560 

HSPs: 

Score = 1230 (184.5 bits), Expect - 3.2e-l25, P = 3.2e-125 
Identities « 251/497 (50%), Positives - 344/497 (69%) 
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Query: 


9 


Sbjct: 


11 


Query: 


69 


Sbjct: 


71 


Query: 


128 


Sbjct: 


129 


Query: 


188 


Sbjct: 


189 


Query: 


248 


Sbjct: 


248 


Query: 


308 


Sbjct: 


308 




368 


Sbjct: 


366 


Query: 


424 


Sbjct: 


426 


Query: 


484 


Sbjct: 


486 



F EHMG L D P R L LQA VT D LG WS R PTL I QEKA I P L ALEG K DL LARA RTG S GKT AA Y A I PMLQL 68 
F + LD R+L+AV LGW +PTLIQ AIPL LEGKD++ RARTGSGKTA YA+P++Q 



+L+ K 



EQ V +VL PTKEL RQ++ +I+QL C + VRVA+++ ++ D+V+Q 



L E PD+VV TP+ +L++ + S+ 



+E LWDEADL+F++G+E++ K L+ HL 



P IYQA L+SAT +DV +K L L+NPVTLKL+E +L DQL +++ E E DK + 



LYALLKL LIRGKS++FVN+++R Y++RLFLEQF I CVLN ELP R H ISQFN+G 



YD +IA+D 



G + K ++ D E+ +RGIDF V+ V+NFD P 

-GGKSATNRKSPRSGDMESSASRGIDFQCVNNVINFDFPRDVT 365 



+YIHRAGRTAR NN G VL+FV 



+E+ L 



1+ YQF+MEE+ 



E FRYR +D R+ T+ A+ + R++EIK E+L+ EKLK +FE+N RDLQ LRHD PL 



V+ HL +P+Y+VP AL+ +V 



Pedant information for DKFZphfbr2_82i24, frame 1 



Report for DKFZphfbr2_82i24 . 1 



[LENGTH] 547 

[MW] 61589.88 

[plj 9.34 

[HOMOL] TREMBL:AF017777_10 gene: "hlc"; product: "helicase" Drosophila melanogaster 

tweety (tty), flightless (fli), dodo (dod) , penguin (pen) , small optic lobes (sol), innocent 
bystander (iby) , waclaw (waw) , bobby sox (box), sluggish (slg), helicase (hlc), mis a to (mst), 
and la costa (lcs) genes, complete cds. le-121 



[FUNCAT] 
[FUNCAT] 
2e-42 
[ FUNCAT ) 
(FUNCAT] 
(FUNCAT) 
[FUNCAT] 
cerevisiae, 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
influenzae, 
( FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[BLOCKS] 
[ BLOCKS ] 
[BLOCKS] 
[BLOCKS] 
[PIRKW] 
[PIRKWJ 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 



98 classification not yet clear-cut [S. 
j mrna translation and ribosome biogenesis 



cerevisiae, YLR276c] le-109 

[H. influenzae, HI0231 RNA) 



04.01.04 rrna processing [S. cerevisiae, YLL008w] 8e-40 

06.10 assembly of protein complexes [S. cerevisiae, YLL008w] 8e-40 

30.10 nuclear organization [S. cerevisiae, YLL008w] 8e-40 

05.04 translation (initiation, elongation and termination) [S. 
YKR059w] 3e-39 

30.03 organization of cytoplasm [S. cerevisiae, YKR059w] 3e-39 

04.99 other transcription activities [S. cerevisiae, YDLl60c] 3e-35 
04.05.03 mrna processing (splicing) [S. cerevisiae, YPL119c] 3e-29 

04.05.01.07 chromatin modification [S. cerevisiae, YMR290c] 4e-29 

1 genome replication, transcription, recombination and repair [H. 
HI0892) le-27 

09.01 biogenesis of cell wall [S. cerevisiae, YJL033w] 2e-27 
30.16 mitochondrial organization [S. cerevisiae, YDRl94c] 4e-21 

99 unclassified proteins (S. cerevisiae, YGL064c] le-05 

BL00039D DEAD-box subfamily ATP-dependent helicases proteins 
BL00039C DEAD-box subfamily ATP-dependent helicases proteins 
BL00039B DEAD-box subfamily ATP-dependent helicases proteins 
BL00039A DEAD-box subfamily ATP-dependent helicases proteins 
nucleus 4e-34 
RNA binding 7e-41 
DEAD box 2e-38 
transmembrane protein 9e-20 
DNA binding 8e-23 
ATP le-107 

purine nucleotide binding 2e-38 

P-loop le-107 

hydrolase 2e-35 

protein biosynthesis 2e-38 

ATP binding 7e-43 
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[SUPFAMJ WW repeat homology le-26 

(SUPFAMJ DEAD/H box helicase homology le-107 

(SUPFAMJ unassigned DEAD/H box helicases le-107 

(SUPFAMJ ATP-dependent RNA helicase DBPl 3e-31 

(SUPFAMJ ATP-dependent RNA helicase DHH1 2e-35 

(SUPFAMJ translation initiation factor eIF-4A 2e-38 

(SUPFAMJ tobacco ATP-dependent RNA helicase DB10 le-26 

[ PROS I T E ] AT P_GT P_A 1 

[PROSITE] LEUCINE_ZIPPER 1 

(PFAMJ Helicases conserved C-terminal domain 

[PFAMJ DEAD and DEAH box helicases 

[KWJ Alpha Beta 

[KW] LOW_COMPLEXITY 9.87 % 



SEQ MEDSEALGFEHMGLDPRLLQAVTDLGWSRPTLIQEKAIPLALEGKDLLARARTGSGKTAA 

SEG 

PRD ccccccccccccccchhhhhhhhhhccccccccccccccccccccceeeeecccccccee 

SEQ YAIPMLQLLLHRKATGPVVEQAVRGLVLVPTKELARQAQSMIQQLATYCARDVRVANVSA 

SEG 

PRD ehhhhhhhhhhhcccccccccceeeeeeccchhhhhhhhhhhhhhhhhhhcceeeeeecc 

SEQ AEDSVSQRAVLMEKPDVVVGTPSRILSHLQQDSLKLRDSLELLWDEADLLFSFGFEEEL 

SEG xxxxxxxxxxxx 

PRD ccchhhhhhhhhcccceeeeccccchhhhhhcccccchhhhhhhhhhhhhhhhhcchhhh 

SEQ KSLLCHLPRIYQAFLMSATFNEDVQALKELILHNPVTLKLQESQLPGPDQLQQFQVVCET 

SEG 

PRD hhhhhhccchhhhhhhhhccchhhhhhhhhhhcccceeeeeccccccchhhhhhhhhhhh 

SEQ EEDKFLLLYALLKLSLIRGKSLLFVNTLERSYRLRLFLEQFSIPTCVLNGELPLRSRCHI 

SEG xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhccceeeeeeehhhhhhhhhhhhhhcccceeeccccchhhhhhhh 

SEQ ISQFNQGFYDCVIATDAEVLGAPVKGKRRGRGPKGDKASDPEAGVARGIDFHHVSAVLNF 

SEG xxxxxxxxxxxxx 

PRD hhhhhccceeeeeeccccccccccccccccccccccccccccccccccccccceeeeeec 

SEQ DLPPTPEAYIHRAGRTARANNPGIVLTFVLPTEQFHLGKIEELLSGENRGPILLPYQFRM 

SEG : 

PRD ccccccceeeeccccccccccccceeeeeecchhhhhhhhhhhhhhhccccccccccchh 

SEQ EEIEGFRYRCRDAMRSVTKQAIREARLKEIKEELLHSEKLKTYFEDNPRDLQLLRHDLPL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhccc 

SEQ HPAVVKPHLGHVPDYLVPPALRGLVRPHKKRKKLSSSCRKAKRAKSQNPLRSFKHKGKKF 

SEG xxxxxxxxxxxxxxxxxx 

PRD cccccccccccccceeeccccccccccccccccccchhhhhhcccccccccccccccccc 

SEQ RPTAKPS 

SEG 

PRD ccccccc 



Prosite for DKFZphfbr2_82i24 . 1 

PS00017 51->59 ATP_GTP_A PDOC00017 

PS00029 149->171 LEUCINE ZIPPER PDOC00029 



Pfam for DKFZphfbr2_82i24 . 1 



HMM_NAME 


DEAD and DEAH box helicases 




HMM 
Query 


13 


*gLpPWILRnIyeMGFEkPTPIQQqAIPiILeGRDVMACAQTGSGKTAAF 
GL+P +L +++++G+++PT IQ++AIP++LEG+D++A+A TGSGKTAA+ 
GLDPRLLQAVTDLGWSRPTLIQEKAIPLALEGKDLLARARTGSGKTAAY 


61 


HMM 
Query 


62 


UPMLQHIDwdP. . .WpqpPQdPrALILAPTRELAMQIQEEcRkFgkHMn 
+IPMLQ +++ + + + +R+L+L+PT ELA+Q Q +++++ ++ 
AIPMLQLLLHRKATGPVVEQA-VRGLVLVPTKELARQAQSMIQQLATYCA 


110 


HMM 
Query 


111 


g. IRImcIYGGtnMRdQMRmLeRGpPHIVIATPGRXiIDHIERgtldLDr . 

+R++ + + Q +L+++P ++V++TP R++ H+++ +L+L++ 
RDVRVANVSAAEDSVSQRAVLMEKP-DVVVGTPSRILSHLQQDSIiKLRDS 


159 


HMM 




IeMLVMDEADRMLDMGFI DQIRrlMrqIPMpwNRQTMMFSATMPdelqEL 
+E LV DEAD +++ GF++++ ++ ++P + Q + SAT+ +++Q L 
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Query 160 LELLVVDEADLLFSFGFEEELKSLLCHLP — RI YQAFLMSATFNEDVQAL 207 

HMM ARrFMRNPIRInldMdElTtnEnlkQwYiyVerEMWKfdcLcrLIe* 

+ +++NP+ + + +++L + ++Q+ +++E E++KF +L+ L++ 
Query 208 KELILHNPVTLKLQESQLPGPDQLQQFQVVCETEEDKFLLLYALLK 253 

HMM_NAME Helicases conserved C- terminal domain 

HMM ♦EileeWLknlGIrvmYIHGdMpQeERdelMddFNnGEynVLIcTDV. . . 

+L+ +L++ I+++++ G +P + R 1+ +FN+G Y++ I+TD+ 
Query 272 YRLRLFLEQFSIPTCVLNGELPLRSRCHIISQFNQGFYDCVIATDAEVL 320 

HMM ggRGI DI PdVNHVINYDMPWNPEqYI 

+RGID+ V+ V N+D+P +PE YI 
Query 321 GAPVKGKRRGRGPKGDKASDPEAGVARGIDFHHVSAVLNFDLPPTPEAYI 370 

HMM QRIGRTgRIG* 

+R+GRT+R++ 
Query 371 H RAG RT ARAN 380 
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DKFZphfbr2_82ml6 



group: brain derived 

DKFZphfbr2 82ml6 encodes a novel 289 amino acid protein with very weak similarity to 
A.thaliana~F28A23. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 

similarity to A.thaliana F28A23.140 

complete cDNA, complete cds, few EST hits 
many ATGs in front of the ORF 

TRANSMEMBRANE 1 
Sequenced by DKFZ 
Locus: /map= M 4" 
Insert length: 2715 bp 

Poly A stretch at pos . 2705, polyadenylation signal at pos. 2687 

1 AGAGGAGGGG AGAGGACTGG GGAGCCGAGC CAGAGCCGGG CTGCCTGCCA 

51 CCCGGCTGCT CGTCCGCTAG CTGGGGAGGA GCGCTCCACC CGCAACTGAC 

101 AAAGGATGGG AGAATGCCCG CGCCCCGGGA TGCCGGCCGC ACGCAGCCTG 

151 GCGGCCGCCT GAGCTACTTC ACCCTCCGCC GGTAAGTGAC TGCAAACATC 

201 ATTCATTCAA TCAGCCTCAC TGGGAGCCCC TTCTCTCCGG CTGGTAGTCC 

251 TGGGCGGCTT GTCCCTGATC CCGAGCGGGG CTTGGCACAG CATCAGCCCT 

301 GGAGGGCAGG CAGCAGGTGC CTTTGCCTGG TGGGTCCACT GGGGAGCGTG 

351 GCTGGGGTTC GCGGCGGGTG CTGCCACCCA ACCTGCGGGC GGCGGGCTCG 

401 CCCAGTAGGC GCCTCTCTGG TGAGAGGAGG CGGCTCCAGC CCGCATCCTG 

4 51 GGGTAGTTGC TACTATTGGC CCCCAGCGCC CGCTCTGCGC GCGCGCCGTT 

501 TCTGGCGGAT" CCCCAGTGCG CGGCGCGCTG TTTACACCGG CGTGGTACTA 

551 GTCACGGAGC CGCACCCCTC GGAAAGCGCG GAGTCGATGA CAGCCACTTC 

601 ACAGGCTCAC GCGCTCCTAG TGTGGGCTTG AAGGGGACGG GGACCGATTA 

651 CCAAAGGAGA GCGCTGAGTA CGGAAGACAC AGGGCAGCCT TTGTCTTGGG 

701 TTTAGCGCTG ATGCGCTCAA CCCTGAGTCG GGTTCACTGC AACTGTTGTG 

751 TCCGATTTCG GTTCCCTGCA ACCGCCCTCC TGGGCGAGAG ATGTCATTGT 

801 GTTCCTGCGG CCAGCGGGAC TGAGAGCTGG GACTTAAGAC GCCAGGAGGG 

851 TCCTGCGCTC ACGGGAAATG TACCCCAAAA GAACTCTGAG AGAATATACT 

901 CAACTGTCCT GCTGTGATTA AACAAGACTG CTGTATTTTA ATTTCAGAAA 

951 TTGAAAAGGG ATAGGAGGAA GGGGAAAATG CTGGGCTGGT GTGAAGCGAT 

1001 AGCCCGTAAC CCTCACAGAA TTCCAAACAA CACGCGAACA CCCGAGATCT 

1051 CAGGGGATTT GGCTGACGCC TCACAAACCT CCACATTGAA TGAAAAATCC 

1101 CCAGGGCGAT CTGCAAGTCG ATCAAGTAAC ATTTCAAAAG CAAGCAGCCC 

1151 AACAACAGGG ACAGCTCCCA GGAGCCAGTC AAGGTTGTCT GTCTGTCCAT 

1201 CCACTCAGGA CATCTGCAGA ATCTGTCACT GCGAAGGGGA TGAAGAGAGC 

1251 CCCCTCATCA CACCCTGTCG CTGCACTGGG ACACTGCGCT TTGTCCACCA 

1301 GTCCTGCCTC CACCAGTGGA TAAAGAGCTC AGATACACGC TGCTGTGAGC 

1351 TCTGCAAGTA TGACTTCATA ATGGAGACCA AGCTCAAACC CCTCCGGAAG 

1401 TGGGAGAAAC TACAGATGAC CACAAGTGAA AGGAGGAAAA TATTCTGCTC 

1451 TGTCACATTC CACGTAATCG CGATCACCTG TGTGGTTTGG TCTTTGTATG 

1501 TATTGATAGA CCGGACAGCG GAGGAAATCA AGCAAGGCAA TGACAATGGT 

1551 GTCCTTGAAT GGCCATTTTG GACAAAACTG GTTGTGGTAG CCATTGGCTT 

1601 CACAGGAGGT CTTGTCTTCA TGTACGTACA GTGTAAAGTC TATGTTCAGT 

1651 TGTGGCGCAG GCTGAAGGCC TACAACCGTG TGATCTTTGT ACAAAATTGC 

1701 CCAGACACTG CCAAAAAACT GGAGAAGAAC TTCTCATGTA ATGTAAACAC 

1751 AGACATCAAA GATGCTGTGG TAGTGCCTGT ACCACAAACA GGTGCAAATT 

1801 CACTGCCATC TGCAGAGGGT GGCCCCCCTG AAGTTGTATC AGTCTGATGG 

1851 AACCTGTTGG GAGTTTCTTC ACCGAAGAAT ATCTTTCTAG CCCTCAGCCA 

1901 CTACAAATGA CAGAAGTGAC CTTGAATTAT TTACTCCCTT CAGCTCCTCC 

1951 TTTCTCCTAC TGACACATTT TTCCTGACTT TGTTCAAAGA GGAAAGGAGA 

2001 AAAACAAACA AACAGACCAA ATGCCCAGGA GCCCATGAAG TAATAGCGTA 

2051 AAGTAAAGTA TGATATGGAA ATGTGAAGTT TGCAAGAGAA TGATTTCCAA 

2101 GACAATTAAG AACTACTGGG GCAATGAATG CTTTTAGGCA GTAATCAAAG 

2151 ATTAAATGGA CCCATGATAC TCTTCTTCAC AGTAACAGGG GAAAAGTTCA 

2201 AGAATACAGA CTTGAATTGC GATGTGTATT ACTTCTAGGG CCTTGTAATG 

2251 TTAACTGTCT CATCTGGAAA TAATAACTAA CATATTTGGT TTTAAGCCTG 

2301 AAATTGTCTG CATTATCCCT AAGTCACATT GGAAGTGAAC TTGGAGGATG 

2351 CATATTTTGA TATGCTTTGA CAGCTAACAG ATTTGTATGG TTTAGTGGAG 

2401 TCTGGTTATT TTGACAGATG CATGTTTTTT TTAAATAGAT GCAATATACA 

24 51 TTTGAAGACA TTGATATTTG GAATTAATTA TGTTTGTTTA AGTCACGCAA 

2501 AAGATTTTCA GAAAATGTTC GGATATAATT AGCTCTGTTA AATACCCACA 

2551 GAACTGTTAT CAGGTCTTAT ATTTATTTTC ATCTGGTTCC TCTAATACAG 
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2601 TGCTGTCCAA TAGAAACACA ACAGCCACAA ATGCAGGCCA CAGATGCAAA 
2651 TATTTAACTT CCCAGTAGCC CTATTTTAAA AAGTAAAAAT AAATGTTTGT 
2701 TTGTTAAAAA AAAAA 



BLAST Results 



Entry G374 57 from database EMBLNEW: 
SHGC-57357 Human Homo sapiens STS genomic. 
Length » 458 
Plus Strand HSPs: 

Score = 2116 (317.5 bits), Expect = 4.3e-91, P = 4.3e-91 
Identities » 444/456 (97%} 



Medline entries 



No Medline entry 



Peptide information for frame 3 



1 MLGWCEAIAR NPHRI PNNTR TPEISGDLAD ASQTSTLNEK SPGRSASRSS 

51 NISKASSPTT GTAPRSQSRL SVCPSTQDIC RICHCEGDEE SPLITPCRCT 

101 GTLRFVHQSC LHQWIKSSDT RCCELCKYDF IMETKLKPLR KWEKLQMTTS 

151 ERRKIFCSVT FHVIAITCVV WSLYVLIDRT AEEIKQGNDN GVLEWPFWTK 

201 LWVAIGFTG GLVFMYVQCK VYVQLWRRLK AYNRVIFVQN CPDTAKKLEK 

251 NFSCNVNTDI KDAVVVPVPQ TGANSLPSAE GGPPEWSV 

ORF from 978 bp to 1844 bp; peptide length: 289 
Category: similarity to unknown protein 



BLASTP hits 
Entry AB011169_1 from database TREMBL: 

gene: "KIAA0597"; product: "KIAA0597 protein"; Homo sapiens mRNA for 

KIAA0597 protein, partial cds. 

Score = 188, P = 6.0e-12, identities = 30/54, positives « 38/54 
Entry SPBC14F5_7 from database TREMBL: 

gene: "SPBC14F5 . 07"; product: "hypothetical protein"; S.pombe 
chromosome II cosmid cl4F5. 

Score = 185, P « 1.9e-U, identities = 29/53, positives - 38/53 
Entry CEY57A10B_1 from database TREMBL: 

gene: "Y57A10B.1"; Caenorhabditis elegans cosmid Y57A10B 

Score - 171, P - 2.6e-10, identities =* 40/107, positives - 58/107 



Alert BLASTP hits for DKFZphfbr2_82ml6, frame 3 

TREMBL: ATF28A23_14 gene: "F28A23 . 140"; product: "putative protein"; 

Arabidopsis thaliana DNA chromosome 4, BAC clone F28A23 (ESSAII 

project), N « 1, Score - 198, P » 3.4e-13 



>TREMBL : AT F2 8 A2 3_1 4 gene: "F28A23 . 140"; product: "putative protein"; 

Arabidopsis thaliana DNA chromosome 4, BAC clone F28A23 (ESSAII project) 
Length ~ 1,051 

HSPs: 

Score = 198 (29.7 bits), Expect - 3.4e-13, P - 3.4e-13 
Identities - 38/103 (36%), Positives = 61/103 (59%) 

Query: 28 LADASQTSTLNEKSPGRSASRS-SNISKASSPTTGTAPRSQSRLSVCPSTQDICRICHCE 86 

+++ S +S+ + SP +++ SN+ A S TG+ +D+CRIC 
Sbjct: 20 VSEPSVSSSSSSSSPNQASPNPFSNMDPAVSTATGSRYVDDDE DEEDVCRICRNP 74 

Query: 87 GDEESPLITPCRCTGTLRFVHQSCLHQWIKSSDTRCCELCKYDF 130 

GD ++PL PC C+G+++FVHQ CL QW+ S+ R CE+CK+ F 
Sbjct: 75 GDADNPLRYPCACSGSIKFVHQDCLLQWLNHSNARQCEVCKHPF 118 
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Pedant information for DKFZphfbr 2 — 82ml 6, frame 3 



Report for DKFZphfbr2_82ml6 . 3 



[LENGTH] 289 

[MW] 32308.36 

[pi] 8.76 

[HOMOL] PIR:T00268 hypothetical protein KIAA0597 - human (fragment) 9e-14 

[FUNCAT] 04.99 other transcription activities [S. cerevisiae, YIL030c] 4e-09 

(PIRKW) transmembrane protein 9e-08 

[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 4 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 3 

[PROSITE] ASN_GLYCOSYLATION 3 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 6.57 % 



SEQ MLGWCEAIAKNPHRIPNNTRTPEISGDLADASQTSTLNEKSPGRSASRSSNISKASSPTT 

SEG xxxxxxxxxxxxxxxxxxx . . 

PRD ccchhhhhhccccccccccccccccchhhhhhhhhccccccccccccccccccccccccc 

SEQ GTAPRSQSRLSVCPSTQDICRICHCEGDEESPLITPCRCTGTLRFVHQSCLHQWIKSSDT 

SEG 

PRD ccccccccccccccccceeeeeeecccccccccccccccccceeeeehhhhhhhhhcccc 

SEQ RCCELCKYDFIMETKLKPLRKWEKLQMTTSERRKIFCSVTFHVIAITCVVWSLYVLIDRT 

SEG 

PRD ceeeeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccc 

SEQ AEEIKQGNDNGVLEWPFWTKLVWAIGFTGGLVFMYVQCKVYVQLWRRLKAYNRVIFVQN 

SEG 

PRD ccccccccccceeehhhhheeeeeeecccccceeeeehhhhhhhhhhhhhhhheeeeeee 

SEQ CPDTAKKLEKNFSCNVNTDIKDAVWPVPQTGANSLPSAEGGPPEVVSV 

SEG 

PRD ccchhhhhhccccccccccceeeeeeecccccccccccccccccccccc 



Prosite for DKF2phfbr2_82ml6. 3 



PS00001 
PS00001 
PS00001 
PS00005 
PS00005- 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 



17->21 
51->'55 
251->255 
102->105 
150->153 
244->247 
36->40 
75->79 
148->152 
180->184 
121->129 
187->193 



ASN_GLYCOSYLATION 
ASN_GLYCOSYLATION 
A SN_G L YC 0 S Y LAT I ON 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
CK2_PHOS PHO_S I TE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_S I TE 
TYR_PHOSPHO SITE 
MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 



(No Pfam data available for DKFZphfbr2_82ml6. 3) 
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DKFZphfbr2_82m6 



group: signal transduction 

DKFZphfbr2_82m6. 3 encodes a novel 654 amino acid protein with similarity to murine sphingosine 
kinase . 

Sphingosine kinase is a new type of lipid kinase, which is regulated by growth factors. The 
enzyme phosphorylates sphingosine, which subsequently exerts intracellular and extracellular 
actions. Intracellulary, sphingosine 1-phosphate (SPP) promotes proliferation and inhibits 
apoptosis. In yeast, survival of cells exposed to heat shock indicates is dependend on SPP. 
Extracellulary, SPP inhibits cell motility and influences cell morphology, effects that appear 
to be mediated by the G protein-coupled receptor EDG1 . 

The new protein can find application in modulating/blocking the shingosine kinase 
intracellular signal transmission pathway. 



strong similarity to mouse "sphingosine kinase" 

complete cDNA, complete cds, EST hits, 
YLR260w/YOR171c Lcb5p/Lcb4p « long chain base kinases, 
involved in biosynthesis of sphingolipids 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2875 bp 

Poly A stretch at pos. 2865, polyadenylation signal at pos. 2838 



1 AGTGTTGGAG GTGAGGAGGC GGGGCTGGCA GGGCTAGTCG GGGCATCTGG 

51 AAATTTCCGA CCCCACGCTT CGGGCGTTTC CTTATCAGGT TCACCGCTCC 

101 CTGATCTCGC GCTGCACTTC GTAGGCGCAG CCGCTGCTTG GGAAGTCCTA 

151 CTTAAGAGCT GAAGGTCAGG CCAGGACAGT GAGACCTGAC TCCTTGCTCC 

201 TACCAGCCTA CTATGGCTTA AGACCCAGGG CCAGGGTCCC GTTGATGTAA 

2 51 CAGAGCAGAG GACCAGCAGA TGAATGGACA CCTTGAAGCA GAGGAGCAGC 

301 AGGACCAGAG GCCAGACCAG GAGCTGACCG GGAGCTGGGG CCACGGGCCT 

351 AGGAGCACCC TGGTCAGGGC TAAGGCCATG GCCCCGCCCC CACCGCCACT 

401 GGCTGCCAGC ACCTCGCTCC TCCATGGCGA GTTTGGCTCC TACCCAGCCC 

451 GAGGCCCACG CTTTGCCCTC ACCCTTACAT CGCAGGCCCT GCACATACAG 

501 CGGCTGCGCC CCAAACCTGA AGCCAGGCCC CGGGGTGGCC TGGTCCCGTT 

551 GGCCGAGGTC TCAGGCTGCT GCACCCTGCG AAGCCGCAGC CCCTCAGACT 

601 CAGCGGCCTA CTTCTGCATC TACACCTACC CTCGGGGCCG GCGCGGGGCC 

651 CGGCGCAGAG CCACTCGCAC CTTCCGGGCA GATGGGGCCG CCACCTACGA 

701 AGAGAACCGT GCCGAGGCCC AGCGCTGGGC CACTGCCCTC ACCTGTCTGC 

751 TCCGAGGACT GCCACTGCCC GGGGATGGGG AGATCACCCC TGACCTGCTA 

801 CCTCGGCCGC CCCGGTTGCT TCTATTGGTC AATCCCTTTG GGGGTCGGGG 

851 CCTGGCCTGG CAGTGGTGTA AGAACCACGT GCTTCCCATG ATCTCTGAAG 

901 CTGGGCTGTC CTTCAACCTC ATCCAGACAG AACGACAGAA CCACGCCCGG 

951 GAGCTGGTCC AGGGGCTGAG CCTGAGTGAG TGGGATGGCA TCGTCACGGT 

1001 CTCGGGAGAC GGGCTGCTCC ATGAGGTGCT GAACGGGCTC CTAGATCGCC 

1051 CTGACTGGGA GGAAGCTGTG AAGATGCCTG TGGGCATCCT CCCCTGCGGC 

1101 TCGGGCAACG CGCTGGCCGG AGCAGTGAAC CAGCACGGGG GATTTGAGCC 

1151 AGCGCTGGGC CTCGACCTGT TGCTCAACTG CTCACTGTTG CTGTGCCGGG 

1201 GTGGTGGCCA CCCACTGGAC CTGCTCTCCG TGACGCTGGC CTCGGGCTCC 

1251 CGCTGTTTCT CCTTCCTGTC TGTGGCCTGG GGCTTCGTGT CAGATGTGGA 

1301 TATCCAGAGC GAGCGCTTCA GGGCCTTGGG CAGTGCCCGC TTCACACTGG 

1351 GCACGGTGCT GGGCCTCGCC ACACTGCACA CCTACCGCGG ACGCCTCTCC 

1401 TACCTCCCCG CCACTGTGGA ACCTGCCTCG CCCACCCCTG CCCATAGCCT 

1451 GCCTCGTGCC AAGTCGGAGC TGACCCTAAC CCCAGACCCA GCCCCGCCCA 

1501 TGGCCCACTC ACCCCTGCAT CGTTCTGTGT CTGACCTGCC TCTTCCCCTG 

1551 CCCCAGCCTG CCCTGGCCTC TCCTGGCTCG CCAGAACCCC TGCCCATCCT 

1601 GTCCCTCAAC GGTGGGGGCC CAGAGCTGGC TGGGGACTGG GGTGGGGCTG 

1651 GGGATGCTCC GCTGTCCCCG GACCCACTGC TGTCTTCACC TCCTGGCTCT 

1701 CCCAAGGCAG CTCTACACTC ACCCGTCTCC GAAGGGGCCC CCGTAATTCC 

1751 CCCATCCTCT GGGCTCCCAC TTCCCACCCC TGATGCCCGG GTAGGGGCCT 

1801 CCACCTGCGG CCCGCCCGAC CACCTGCTGC CTCCGCTAGG CACCCCGCTG 

1851 CCCCCAGACT GGGTGACGCT GGAGGGGGAC TTTGTGCTCA TGTTGGCCAT 

1901 CTCGCCCAGC CACCTAGGCG CTGACCTGGT GGCAGCTCCG CATGCGCGCT 

1951 TCGACGACGG CCTGGTGCAC CTGTGCTGGG TGCGTAGCGG CATCTCGCGG 

2001 GCTGCGCTGC TGCGCCTTTT CTTGGCCATG GAGCGTGGTA GCCACTTCAG 

2051 CCTGGGCTGT CCGCAGCTGG GCTACGCCGC GGCCCGTGCC TTCCGCCTAG 

2101 AGCCGCTCAC ACCACGCGGC GTGCTCACAG TGGACGGGGA GCAGGTGGAG 

2151 TATGGGCCGC TACAGGCACA GATGCACCCT GGCATCGGTA CACTGCTCAC 

2201 TGGGCCTCCT GGCTGCCCGG GGCGGGAGCC CTGAAACTAA ACAAGCTTGG 

2251 TACCCGCCGG GGGCGGGGCC TACATTCCAA TGGGGCGGAG CCTGAGCTAG 

.2301 GGGGTGTGGC CTGGCTGCTA GAGTTGTGGT GGCAGGGGCC CTGGCCCCGT 
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2351 CTCAGGATTG CGCTCGCTTT CATGGGACCA GACGTGATGC TGGAAGGTGG 

2401 GCGTCGTCAC GGTTAAAGAG AAATGGGCTC GTCCCGAGGG TAGTGCCTGA 

2451 TCAATGAGGG CGGGGCCTGG CGTCTGATCT GGGGCCGCCC TTACGGGGCA 

2501 GGGCTCAGTC CTGACGCTTG CCACCTGCTC CTACCCGGCC AGGATGGCTG 

2551 AGGGCGGAGT CTATTTTACG CGTCGCCCAA TGACAGGACC TGGAATGTAC 

2601 TGGCTGGGGT AGGCCTCAGT GAGTCGGCCG GTCAGGGCCC GCAGCCTCGC 

2651 CCCATCCACT CCGGTGCCTC CATTTAGCTG GCCAATCAGC CCAGGAGGGG 

2701 CAGGTTCCCC GGGGCCGGCG CTAGGATTTG CACTAATGTT CCTCTCCCCG 

2751 CGGGTGGGGG CGGGGAAATT CATATCCCCT GTTCGTCTCA TGCGCGTCCT 

2801 CCGTCCCCAA TCTAAAAAGC AATTGAAAAG GTCTATGCAA TAAAGGCAGT 
2851 CGCTTCATTC CTCTCAAAAA AAAAA 



BLAST Results 

No BLAST result 

Medline entries 



99045661: 

Tumor necrosis factor-alpha induces adhesion molecule 
expression through the sphingosine kinase pathway. 

98395082: 

Molecular cloning and functional characterization 
of murine sphingosine kinase. 

98241633: 

Purification and characterization of rat kidney sphingosine kinase. 
99178622: 

Sphingosine 1-phosphate: a prototype of a new class of second 
messengers. 



Peptide information for frame 3 



1 MNGHLEAEEQ QDQRPDQELT GSWGHGPRST LVRAKAMAPP PPPLAASTSL 

51 LHGEFGSYPA RGPRFALTLT SQALHIQRLR PKPEARPRGG LVPLAEVSGC 

101 CTLRSRSPSD SAAYFCIYTY PRGRRGARRR ATRTFRADGA ATYEENRAEA 

151 QRWATALTCL LRGLPLPGDG EITPDLLPRP PRLLLLVNPF GGRGLAWQWC 

201 KNHVLPMISE AGLSFNLIQT ERQNHARELV QGLSLSEWDG IVTVSGDGLL 

251 HEVLNGLLDR PDWEEAVKMP VGILPCGSGN ALAGAVNQHG GFEPALGLDL 

301 LLNCSLLLCR GGGHPLDLLS VTLASGSRCF SFLSVAWGFV SDVDIQSERF 

351 RALGSARFTL GTVLGLATLH TYRGRLSYLP ATVEPASPTP AHSLPRAKSE 

401 LTLTPDPAPP MAHSPLHRSV SDLPLPLPQP ALASPGSPEP LPILSLNGGG 

451 PELAGDWGGA GDAPLSPDPL LSSPPGSPKA ALHSPVSEGA PVIPPSSGLP 

501 LPTPDARVGA STCGPPDHLL PPLGTPLPPD WVTLEGDFVL MLAISPSHLG 

551 ADLVAAPHAR FDDGLVHLCW VRSGISRAAL LRLFLAMERG SHFSLGCPQL 

601 GYAAARAFRL EPLTPRGVLT VDGEQVEYGP LQAQMHPGIG TLLTGPPGCP 

651 GREP 



ORF from 270 bp to 2231 bp; peptide length: 654 
Category: similarity to known protein 

BLASTP hits 
Entry SPAC4A8_7 from database TREMBL: 

gene: "SPAC4A8 . 07c"; product: "hypothetical protein**; S.pombe 
chromosome I cosmid c4A8 . 

Score = 301, P = 7.9e-32, identities => 68/190, positives - 109/190 

Entry CEC34C6_3 from database TREMBLNEW: 

product: "C34C6.5"; Caenorhabditis elegans cosmid C34C6 

>TREMBL:CEC34C6_3 product: "03406.5"; Caenorhabditis elegans cosmid 

C34C6 

Score - 273, P - 9.0e-29, identities » 78/265, positives « 142/265 
Entry S67059 from database PIR: 

hypothetical protein YOR171c - yeast {Saccharomyces cerevisiae) 
>TREMBL:SC55021 9 gene: "03615"; product: "0361 5p"; Saccharomyces 
cerevisiae cosmid pUOA1258 from chromosome 15R. >TREMBL: SCYOR170W_2 
S.cerevi3iae chromosome XV reading frame ORF YORl70w 
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Score = 253, P « 2.0e-25, identities = 70/234, positives =» 116/234 
Entry S51398 from database PIR: 

hypothetical protein YLR260w - yeast (Saccharomyces cerevisiae) 
>TREMBL:SCL8479_4 gene: "YLR260W"; product: "Ylr260wp"; Saccharomyces 
cerevisiae chromosome XII cosmid 8479. 

Score = 251, P « 1.0e-24, identities = 62/198, positives - 103/198 



Alert BLAST P hits for DKFZphfbr2_82m6, frame 3 

TREMBL:AF068749_1 gene: "SPHKlb"; product: "sphingosine kinase"; Mus 
musculus sphingosine kinase (SPHKlb) mRNA, complete cds., N = 2, Score 
- 615, P - 1.2e-92 

TREMBL:AF068748_1 gene: "SPHKla"; product: "sphingosine kinase"; Mus 
musculus sphingosine kinase (SPHKla) mRNA, partial cds., N « 2, Score » 
616, P - 2e-92 

TREMBL: ATF18E5_16 gene: "F18E5 . 160" ; product: "putative protein"; 
Arabidopsis thaliana DNA chromosome 4, BAC clone F18E5 (ESSAII 
project), N * 2, Score = 370, P = 6.8e-33 



>TREMBL:AF068748_1 gene: "SPHKla"; product: "sphingosine kinase"; Mus 
musculus sphingosine kinase (SPHKla) mRNA, partial cds. 
Length = 504 

HSPs: 



Score = 616 (92.4 bits), Expect = 2.0e-92, Sum P{2) - 2.0e-92 
Identities - 128/260 (49%), Positives = 173/260 (66%) 



Query: 


154 


ATALTCLLRGLPLPGDGEITPDLLPRPPRLLLLVNPFGGRGLAWQWCKNHVLPMISEAGL 


213 






A C L + E LLPRP R+L+L+NP GG+G A Q ++ V P + EA + 




Sbjct: 


110 


APVAPCQREPRDLAMEPECPRGLLPRPCRVLVLLNPQGGKGKALQLFQSRVQPFLEEAEI 


169 


Query: 


214 


SFNLIQTERQNHARELVQGLSLSEWDGIVTVSGDGLLHEVLNGLLDRPDWEEAVKMPVGI 


273 






+F LI TER+NHARELV L WD + +SGDGL+HEV+NGL++RPDWE A++ P+ 




Sbjct: 


170 


TFKLILTERKNHARELVCAEELGHWDALAVMSGDGLMHEVVNGLMERPDWETAIQKPLCS 


229 


Query: 


274 


LPCGSGNALAGAVNQHGGFEPALGLDLLLNCSLLLCRGGGHPLDLLSVTLASGSRCFSFL 


333 






LP GSGNALA +VN + G+E DLL+NC+LLLCR P++LLS+ ASG R +S L 




Sbjct: 


230 


LPGGSGNALAASVNHYAGYEQVTNEDLLINCTLLLCRRRLSPMNLLSLHTASGLRLYSVL 


289 


Query: 


334 


SVAWGFVSDVDIQSERFRALGSARFTLGTVLGLATLHTYRGRLSYLPA-TVEPASPTPAH 


392 






S++WGFV+DVD++SE++R LG RFT+GT LA+L Y+G+L+YLP TV AS PA 




Sbjct: 


290 


SLSWGFVADVDLESEKYRRLGEIRFTVGTFFRLASLRIYQGQLAYLPVGTV — ASKRPAS 


347 


Query: 


393 


SL-PRAKSELTLTPDPAPPMAH 413 








+L + + L P P +H 




Sbjct: 


348 


TLVQKGPVDTHLVPLEEPVPSH 369 




Score 


= 324 


(48.6 bits), Expect = 2.0e-92, Sum P(2) - 2.0e-92 




Identities : 


= 72/160 (45%), Positives = 100/160 (62%) 




Query: 


499 


LPLPTPDARVGASTC GPPDHLLPPLGTPLPPDWVTL-EGDFVLMLAISPSHLGADLV 


554 






LP+ T ++ AST GP D L PL P+P W + E DF+L+L + +HL ++L 




Sbjct: 


335 


LPVGTVASKRPASTLVQKGPVDTHLVPLEEPVPSHWTWPEQDFLLVLVLLHTHLSSELF 


394 


Query: 


555 


AAPHARFDDGLVHLCWVRSGISRAALLRLFLAMERGSHFSLGCPQLGYAAARAFRLEPLT 


614 






AAP R + G++HL +VR+G+SRAALLRLFLAM++G H L CP L + AFRLEP + 




Sbjct: 


395 


AAPMGRCEAGVMHLFYVRAGVSRAALLRLFLAMQKGKHMELDCPYLVHVPVVAFRLEPRS 


454 


Query: 


615 


PRGVLTVDGEQVEYGPLQAQMHPGIGTLLTGPPGCP-GRE 653 








RGV +VDGE + +Q Q+HP ++ G P GR+ 




Sbjct: 


455 


QRGV FSV DG E LMVC E A VQ GQVH PN YLWMVCG S RDA PSG RD 494 




Score 


= 37 


(5.6 bits), Expect « 3.6e-62, Sum P(2) - 3.6e-62 





Identities = 8/20 (40%), Positives = 9/20 (45%) 

Query: 459 GAGDAPLSPDPLLSSPPGSP 478 

G+ DAP D PP P 

Sbjct: 485 GSRDAPSGRDSRRGPPPEEP 504 



Pedant information for DKFZphfbr2_82m6, frame 3 



Report for DKFZphfbr2_82m6. 3 
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[LENGTH] 654 

[MW] , 69207.45 

[pi] 6.47 

[ HOMOL ] TREMBL:AF068749_1 gene: "SPHKlb"; product: "sphingosine kinase**; Mus rausculus 
sphingosine kinase (SPHKlb) mRNA, complete cds. 2e-50 

[FUNCAT] 01.06.01 lipid, fatty-acid and sterol biosynthesis [S. cerevisiae, YLR260w] 
4e-20 

[PROSITE] AMI DAT I ON 1 

[PROSITE] CAMP PHOSPHO_SITE 1 

[PROSITE] MYRISTYL 12 

[PROSITE] CK2_PHOSPHO_SITE 6 

[PROSITE] TYR~PHOSPHO_SITE 1 

[PROSITE] GLYCOSAMINOGLYCAK 1 

[PROSITE] PKC_PHOSPHO_SITE 8 

[PROSITE] ASN_GLYCOSYLATION 1 

[KWJ Alpha_Beta 

(KW] LOW COMPLEXITY 20.18 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MNGHLEAEEQQDQRPDQELTGSWGHGPRSTLVRAKAMAPPPPPLAASTSLLHGEFGSYPA 

xxxxxxxxxxxxx 

ccchhhhhhhhcccccceeecccccccceeehhhhhccccccceeeceeeeccccccccc 

RGPRFALTLTSQALHIQRLRPKPEARPRGGLVPLAEVSGCCTLRSRSPSDSAAYFCIYTY 

cccceeehhhhhhhhhhhhhccccccccccceeeeeeeceeeeeecccccceeeeeeeec 



SEQ PRGRRGARRRATRTFRADGAATYEENRAEAQRWATALTCLLRGLPLPGDGEITPDLLPRP 

SEG . xxxxxxxxxxxxxxxxxxxxx xxxxx 

PRD ccccchhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhccccccccccccccccccc 

SEQ PRLLLLVNPFGGRGLAWQWCKNHVLPMISEAGLSFNLIQTERQNHARELVQGLSLSEWDG 

SEG xxxxxx 

PRD ceeeeeeecccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhccccce 

SEQ IVTVSGDGLLHEVLNGLLDRPDWEEAVKMPVGILPCGSGNALAGAVNQHGGFEPALGLDL 

SEG xxxxx 

PRD eeeecccccceeeccccccccchhhhhccceeeccccccccccccccccccccchhhhhh 

SEQ LLNCSLLLCRGGGHPLDLLSVTLASGSRCFSFLSVAWGFVSDVDIQSERFRALGSARFTL 

SEG xxxxxxxxxxxxx 

PRD hhhhhhccccccccccceeeeeeccccceeeeeeeeccccceeeehhhhhhhhhhhhhhc 

SEQ GTVLGLATLHTYRGRLSYLPATVEPASPTPAHSLPRAKSELTLTPDPAPPMAHSPLHRSV 

SEG 

PRD hhhhhhhhhhhhcccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ SDLPLPLPQPALASPGSPEPLPILSLNGGGPELAGDWGGAGDAPLSPDPLLSSPPGSPKA 

SEG . .xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccceeeeeccccccccccccccccccccccccccccccccce 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



ALHSPVSEGAPVIPPSSGLPLPTPDARVGASTCGPPDHLLPPLGTPLPPDWVTLEGDFVL 

XX xxxxxxxxxxxxxxx 

eeccccccccccccccccccccccccccccccccccccccccccccccccccccccccee 

MLAISPSHLGADLVAAPHARFDDGLVHLCWVRSGISRAALLRLFLAMERGSHFSLGCPQL 

eeeeecccccccccccccccccccceeeeeeeccchhhhhhhhhhhhhcccceeecccch 

GYAAARAFRLEPLTPRGVLTVDGEQVEYGPLQAQMHPGIGTLLTGPPGCPGREP 

\ xxxxxxxxxxxxxxx. . . 

hhhhhhhhhhccccccceeeeccceeecccccccccccccceeecccccccccc 



Prosite for DKFZphfbr2_82m6. 3 



PS00001 


303 


->307 


PS00002 


245 


->249 


PS00004 


129 


->133 


PS00005 


102 


->105 


PS00005 


134 


->137 


PS00005 


220 


->223 


PS00005 


347 


->350 


PS00005 


355 


->358 


PS00005 


371 


->374 


PS00005 


477 


->480 


PS00005 


614 


->617 


PSO0OO6 


107 


->m 



ASN_GLYCOSYLATION 

GLYCOSAMINOGLYCAN 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

pkc2phospho_site 

PKC_PHOSPHO_SITE 
CK2 PHOSPHO SITE 



PDOC00001 
PDOC00002 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
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pcnnnng 


142- 


->146 


CK2 PHOSPHO 


SITE 


PDOC00006 




234- 


->238 


CK2~ PHOSPHO" 


"site 


PDOC00006 




236- 


->240 


C K?~ PHOSPHO~ 


"site 


PDOC00006 




341- 


->345 


rK?*"pHfl<5PHfi~ 


"site 


PDOC00006 


raUUUUD 


419- 


->423 




"site 


PDOC00006 

t U>A^ WWW 




106- 


->115 




"site 


PDOC00007 


rjUUUUo 


56->62 


MVRT CTVT 

n i ni lb 




pnocoooos 

<t v Vv W w V v 




212- 


->218 


MYRT QTYT 

n i rvj. si il 




PDOC00008 


pQflOOOS 

t o V w v vO 


232- 


->238 


MYRISTYL 




PDOC00008 


PS00008 


272- 


->278 


MYRISTYL 




PDOC00008 


P500008 


277->283 


MYRISTYL 




PDOC00008 


PS00008 


279- 


->285 


MYRISTYL 




PDOC00008 


P500008 


361->367 


MYRISTYL 




PDOC00008 


PS00008 


476- 


->482 


MYRISTYL 




PDOC00008 


P500008 


509->515 


MYRISTYL 




PDOC00008 


PS00008 


574- 


->580 


MYRISTYL 




PDOC00008 


PS00008 


590- 


->596 


MYRISTYL 




PDOC00008 


PS00008 


640- 


->646 


MYRISTYL 




PDOC00008 


PS00009 


122- 


->126 


AM I DAT I ON 




PDOC00009 



(No Pfam data available for DKFZphfbr2_82m6 . 3) 
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DKFZphfkd2_lj9 



group: kidney derived 

DKFZphfkd2 lj9.3 encodes a novel 105 amino acid protein with high similarity to Xenopus laevis 
XLCL2 protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of kidney-specific 
genes . 

strong similarity to XLCL2 protein, African clawed frog 

complete cDNA, complete cds, EST hits 

Sequenced by LMU 

Locus : unknown 

Insert length: 2955 bp 

Poly A stretch at pos. 2935, polyadenylation signal at pos. 2915 

1 GGGGGGGGCT GAGTGCTCAG TGGAGAGCGG GGAGTTGTGT CCACCTTGCC 
51 GACGTCGCTA GCCGTGGGGC TGTCCTGGGA AGGCGGACGG CGAGCGCCCG 

101 GTGTCCGCAC TCGGCCGCCT GCCGTGCCCG TCTGCGCCCG TGTCATCCTC 

151 ACTCGGGACG CAGGGACCGT TTTTAAATCA CAGGGGCGTG TGTCAGCCTG 

201 CCCTAGGACT TCATGTCTAT ATATTTCCCC ATTCACTGCC CCGACTATCT 

251 GAGATCGGCC AAGATGACTG AGGTGATGAT GAACACCCAG CCCATGGAGG 

301 AGATCGGCCT CAGCCCCCGC AAGGATGGCC TTTCCTACCA GATCTTCCCA 

351 GACCCGTCAG ATTTTGACCG CCGCTGCAAA CTGAAGGACC GTCTGCCCTC 

401 CATAGTGGTG GAACCCACAG AAGGGGAGGT GGAGAGCGGG GAGCTCCGGT 

451 GGCCCCCTGA GGAGTTCCTG GTCCAGGAGG ATGAGCAAGA TAACTGCGAA 

501 GAGACAGCGA AAGAAAATAA AGAGCAGTAG AGTCCCTGTG GACTCCCATG 

551 GGTCATACCA GCCAGCATCT GTTCCTGAAC TGTGTTTTTC CCATCATGAC 

601 GGAAGAAGAG AGTGAGCCGC AAT-TGTTCTG AAAATGTCAA ACGAGGCTTC 

651 TGTTTTGCAC CTGCAGATCA CCGAGTTGGT TTTCTTTTCT TTTCTTGCCT 

701 TTTTTTTTTT TTTGAAATTT GCCGAGCAGT GGAGCCCTCT GACAATTTGC 

751 AAGGCCCTCT GAGAAAGGAA GCTGCTTAGA GCCAGGGGGT TAGTGGGTGA 

801 GGGGAGCGAG TGCTGTTTTT GAGATCATTA TCTGAACTCA GGCAGCCTAG 

851 TAGAGGCAGT GGTGGGATTC CAATGGGTCT TGGTGGGTGG GAGGTGGGGC 

901 ATGTGCAAAG CAAGCAAGGA ACATTTGGGG TAAGAAAACA AACATGAGGC 

951 AAAAGAAAAA ATACATGTTT TTAAGAAAAC ATTGAGCAGA GAACTGCAGC 
1001 CAGGATGCGC TCAGCAGACA TTCACTCTGG CCGCTGGGAC ATCAGAAAAC 
1051 AAAGTCTTCA TCTCTCTCTC CAGTTTCACC CACCCCACCC TTTGCTTTCA 
1101 TTTCAGGTGT GTTGGTCTAT ATGACAGGGA GGAGAGTAAA GGAGAGCAGG 
1151 AGCAATTGGC TGCCTGCAAA GCCAGCTGGA GGTGAAGTGC AGGAAAGGAA 
1201 AGGTCACCCC ATTCTACTCC ATGGCCTCTC TGCTCCCAGC TGTGGTAGGC 
1251 TCACATAGCC AGTGTGATCG GTTTTTAAGA GGCAGTGCTT TTCAGCTTTT 
1301 CTCCCTGATA TATCCATTTT GCTTCCCAGC ACTTTTTAGG AGTAGTGAGA 
1351 GCACTTCCTG CCCTTGTTGG AAGCCCCAGG GTGGACACTC AGCACGAAGG 
1401 TCTCTCCCTT AACTGCTGCC CTTCCAAGAC TTGCTCCCGA GATGGAGTGG 
1451 GCGTGGTCTT CCAGGCTGGC CCTTCCTTCT CCTCACCGCC ACCTTCCCTG 
1501 CCCCAGCCCC AGCAGCCATG GGTACATGGG TCCCCAGCTC ACCTATGGAT 
1551 TCCCGCCAGT CTGCCCAGCT GCAGTACTCA CGCCCCATGG GGGATCTTGG 
1601 TCTGTTTTTC TTGTGGGAGC CTAGTGGAGA GCAGACGTGG CTTTTTATGT 
1651 GTCTTGTTGG GGAGGTGACT TGCATGGTGG GGACAAGGCT GTCGTGGCAA 
1701 CCTTGGGATC GAGTTTGAGA CTAAAGGATG TCATGAGATC CCTGGCTTCT 
17 51 CCCCATGTTG TTCCCGGACA AGGGCAGAAG GGAGGCATGG CAAGGGACCT 
1801 CTGCTGTCCT TACTCAACAG TGGTCCTCAT CCCTCCCCAC CTCCCACTGC 
1851 TTCCTGCAAG GGCACCAGTT GT AT GAG AAA GTTGGCCTTT GGACTTAGGA 
1901 TTTCTTATTG TAGCTAAGAG CCATCTGAAG CAGCAGGTTG CAGGACAAAT 
1951 GCTTCAGTCC GCCGAGAGCA GTACCGTGTG GCCAAGAGGT GGACTCAGAG 
2001 CCTTCCTTGA GCTAAACTCG GCCAACCAAG GCACGCAGCA TGTCCCCTCA 
2051 GGTCTCCAGT CAGTCCAGGT TGACCCTCAG TTCTGGACGT GTGTATATAG 
2101 CTGTATTTAA TACCTCAAGG TCATTGTGGC TCTGGGGATG CCAGGGCAGG 
2151 AGGACGAGGG TGCGCTGTGG ACACAGCAGT CCGCGGAATT CCGTTCTGGG 
2201 AAGCCAATGG TCGCCGGCAC CCCTTGCTTC CTCCCTCTGT TGTCTGCCTG 
2251 TGTGACACAC ATCAATGGCA ATAACTTCTT CCAACTCCTC GCAGAAGTGG 
2301 GAGAGGCCGG CAGCCTGCAC CGAGAGGGGC TTTCCTCTCT CTTGCTCCCC 
2351 GCTTCGTTCT GTTTTGGCTG CAGAGAGTGG TTCATCCATA CTCTCATTCC 
24 01 CTCGCCTCCC CTTGTGGACG GGGGTCTTGC CTTTTCAATT CCTGTGTTTT 
24 51 GGTGTCTTCC CTTATCTGCT ACCCTGAATC ACCTGTCCTG GTCTTGCTGT 
2501 GTGATGGGAA CATGCTTGTA AACTGCGTAA CAAATCTACT TTGTGTATGT 
2551 GTCTGTTTAT GGGGGTGGTT TATTATTTTT GCTGGTCCCT AGACCACTTT 
2601 GTATGACCGT TTGCAGTCTG AGCAGGCCAG GGGCTGACAG CTAATGTCAG 
2651 GACCCTCAGC GGTGGAGCCT GCTGGGGGGA CCCAGCTGCT CTTGGACAAG 
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2701 TGGCTGAGCT CCTATCTGGC CTCCTCTTTT TTTTTTTTTT CAAGTAATTT 
2751 GTGTGTATTT CTAACTGATT GTATTGAAAA AATTCCTAGT ATTTCAGTAA 
2801 AAATGCCTGT TGTGAGATGA ACCTCCTGTA ACTTCTATCT GTTCTTTTTT 
2851 GAGGCTCAGG GAGAAACTAG CATTTTTTTT TTTCCAAACT ACTTTTTGTC 
2901 ACTGTGACAG TTGTAAATAA AGTTTGAAAA TGCTCAAAAA AAAAAAAAAA 
2951 AAAAC 



BLAST Results 



Entry H5G19750 from database EMBL: 
human STS A001X24. 
Score - 1050, P « 1.9e-39, identities - 212/213 

Entry HSG20267 from database EMBL: 
human STS A005C12. 
Score = 610, P = 4.1e-19, identities « 122/122 



No Medline entry 



Medline entries 



Peptide information for frame 3 



ORF from 213 bp to 527 bp; peptide length: 105 
Category: strong similarity to known protein 
Classification: unset 



1 MSIYFPIHCP DYLRSAKMTE VMMNTQPMEE IGLSPRKDGL SYQIFPDPSD 
51 FDRRCKLKDR LPSIVVEPTE GEVESGELRW PPEEFLVQED EQDNCEETAK 
101 ENKEQ 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_l j 9, frame 3 

PIR:S52241 XLCL2 protein - African clawed frog, N = 1, Score 443, P = 
8e-42 

PIR:S52241 XLCL2 protein - African clawed frog, N «* 1, Score = 443, P » 
8.2e-42 

>PIR:S52241 XLCL2 protein - African clawed frog 
Length = 102 

HSPs: 

Score - 443 (66.5 bits), Expect = 8.0e-42, P » 8.0e-42 
Identities = 80/104 (76%), Positives = 95/104 (91%) 

Query: 1 MSIYFPIHCPDYLRSAKMTEVMMNTQPMEEIGLSPRKDGLSYQIFPDPSDFDRRCKLKDR 60 

MS+++PIHC DYLRSA+MTEV+MNTQ M+EIGLSPRKD SYQIFPDPSDF+R CKLKDR 
Sbjct: 1 MSVFYPIHCTDYLRSAEMTEVIMNTQSMDEIGLSPRKD— SYQIFPDPSDFERCCKLKDR 58 

Query: 61 LPSIVVEPTEGEVESGELRWPPEEFLVQEDEQDNCEETAKENKE 104 

LPSIWEPTEG+VESGELRWPPEEF+V ED++ C++T KEN++ 
Sbjct: 59 LPSIWEPTEGDVESGELRWPPEEFWDEDKEGTCDQTKKENEQ 102 

Pedant information for DKFZphf kd2_lj 9, frame 3 

Report for DKFZphf kd2_lj 9 . 3 

[LENGTH] 105 

[MWJ 12269.78 

tpl] 4.40 

[HOMOL] PIR:S52241 XLCL2 protein - African clawed frog 5e-44 
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[KWJ Alpha_Beta 

SEQ MSIYFPIHCPDYLRSAKMTEVMMNTQPMEEIGLSPRKDGLSYQIFPDPSDFDRRCKLKDR 
PRD cccccccccccchhhhhhhhhhhhcccccccccccccccceeeecccccccchhhhhhhc 

SEQ LPSIVVEPTEGEVESGELRWPPEEFLVQEDEQDNCEETAKENKEQ 
PRD ccceeeecccccccccccccccccceeeccccchhhhhhhhhccc 

(No Prosite data available for DKFZphf kd2_l j9. 3) 
(No Pfam data available for DKFZphf kd2_l j9 . 3) 
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DKFZphfkd2_24al5 



group: transmembrane protein 

DKFZph f kd2_2 4 a 1 5 encodes a novel amino acid protein with similarity to C elegans cosmid 
R07G3. 

The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of kidney-specific 
genes and as a new marker for kidney cells. 



similarity to C. elegans R07G3.8 
membrane regions : 1 

Summary DKFZphf kd2_24al5 encodes a novel 323 amino acid protein, with 
similarity to C. elegans R07G3.8. 



similarity to C. elegans R07G3.8 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 1513 bp 

Poly A stretch at pos. 1494, no polyadenylation signal found 

1 GGGGTACTCG GCGGCGGCGG AGCGGGCGGC AGAGCAGGGC GGCGGCGACT 
51 CGCAGGGTAC CACCATCTTA AGGACAGAAA AGCTACAGGA CTCTAGGAGG 
101 CCACCGTCCT GATTTGGGAA GTCCAACTTA CTTTGGCCAG ACAGCAGCTA 
151 AGCTGGTTCA TCCCATCAGC CTGGATTGGT GAAACTGAAT CACAGGAGAT 
201 ATTTCCAGGT TTGCTGGGAT GGGAAACCTG CTCAAAGTCC TTACCAGGGA 
251 AATTGAAAAC TAT CC AC ACT TTTTCCTGGA TTTTGAAAAT GCTCAGCCTA 
301 CAGAAGGAGA GAGAGAAATC TGGAACCAGA TCAGCGCCGT CCTTCAGGAT 
351 TCTGAGAGCA TCCTTGCAGA CCTGCAGGCT TACAAAGGCG CAGGCCCAGA 
401 GATCCGAGAT GCAATTCAAA ATCCCAATGA CATTCAGCTT CAAGAAAAAG 
451 CTTGGAATGC GGTGTGCCCT CTTGTTGTGA GGCTAAAGAG ATTTTACGAG 
501 TTTTCCATTA GACTAGAAAA AGCTCTTCAG AGTTTATTGG AATCTCTGAC 
551 TTGTCCACCC TACACACCAA CCCAACACCT GGAAAGGGAA CAGGCCCTGG 
601 CAAAGGAGTT TGCCGAAATT TTACATTTTA CCCTTCGATT CGATGAGCTG 
651 AAGATGAGGA ACCCGGCTAT TCAGAATGAC TTCAGCTACT ACAGAAGAAC 
701 AATCAGTCGC AACCGCATCA ACAACATGCA CCTAGACATT GAGAATGAAG 
751 TCAATAATGA GATGGCCAAT CGAATGTCCC TCTTCTATGC AGAAGCCACG 
801 CCAATGCTGA AAACCCTTAG CAATGCCACA ATGCACTTTG TCTCTGAAAA 
851 CAAAACTCTG CCAATAGAGA ACACCACAGA CTGCCTCAGC ACAATGACAA 
901 GTGTCTGTAA AGTCATGCTG GAAACTCCGG AGTACAGAAG TAGGTTTACG 
951 AGTGAAGAGA CCCTGATGTT CTGCATGAGG GTGATGGTGG GAGTCATCAT 
1001 CCTCTATGAC CATGTCCACC CTGTGGGAGC TTTCTGCAAG ACATCCAAGA 
1051 TCGATATGAA AGGCTGCATA AAAGTTTTGA AGGAGCAGGC CCCAGACAGT 
1101 GTGGAGGGGC TGCTAAATGC CCTCAGGTTC ACTACAAAGC ACTTGAACGA 
1151 TGAATCAACT TCCAAACAGA TTCGAGCAAT GCTTCAGTAG AGCTCTGCTC 
1201 AAAGAAGAGG ATCTATGTGC TGACCTCAGA AGATGTATAT GTTTACATAA 
1251 TTTAATACAG ATTGATGTTA ATACTTGTGT ATTTACATAA CCGTTTCCTT 
1301 CTTGTCACTG AAATATATGG ACCTTAATTT GTATCCTGAC TGACTCAACC 
1351 CAGCAGAGCA TAAATTGACT TGAGAGCCTT ACCTTTGATG TCTGAAATGA 
1401 AACCCCCTTC TCCAAAGGCA AAATTCGGAG ACTTTGATCT TTGCTACTGG 
1451 AGTCCTTTAA CAACATCTAT AACGATAAAA AATTCCTAAT TGTCAAAAAA 
1501 AAAAAAAAAA AAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 
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ORF from 219 bp to 1187 bp; peptide length: 323 
Category: similarity to unknown protein 



1 MGNLLKVLTR EIENYPHFFL DFENAQPTEG EREIWNQISA VLQDSESILA 

51 DLQAYKGAGP EIRDAIQNPN DIQLQEKAWN AVCPLVVRLK RFYEFSIRLE 

101 KALQSLLESL TCPPYTPTQH LEREQALAKE FAEILHFTLR FDELKMRNPA 

151 IQNDFSYYRR TISRNRINNM HLDIENEVNN EMANRMSLFY AEATPMLKTL 

201 SNATMHFVSE NKTLPIENTT DCLSTMTSVC KVMLETPEYR SRFTSEETLM 

251 FCMRVMVGVI ILYDHVHPVG AFCKTSKIDM KGCIKVLKEQ APDSVEGLLN 

301 ALRFTTKHLN DESTSKQIRA MLQ 

BLASTP hits 

Entry CER07G3J7 from database TREMBL: 

gene: "R07G3.8"; Caenorhabditis elegans cosmid R07G3. 

Score = 544, P = 1.4e-52, identities = 119/323, positives = 186/323 



Alert BLASTP hits for DKFZphf Icd2_24al5, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_24al5, frame 3 



Report for DKFZphf kd2_24al5 . 3 



[LENGTH] 323 

[MW] 37313.06 

[pi] 5.71 

[HOMOL] TREMBL :CER07G3_7 gene: "R07G3.8"; Caenorhabditis elegans cosmid R07G3. 4e- 

IPROSITEJ MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 4 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 5 

[PROSITE] ASN_GLYCOSYLATION 3 

[KWJ TRANSMEMBRANE 1 



SEQ MGNLLKVLTREIENYPHFFLDFENAQPTEGEREIWNQISAVLQDSESILADLQAYKGAGP 

PRD ccccchhhhhhhhcccceeecccccccchhhhhhhhhhhhhhhcchhhhhhhhhhccccc 

MEM 

SEQ EIRDAIQNPNDIQLQEKAWNAVCPLVVRLKRFYEFSIRLEKALQSLLESLTCPPYTPTQH 

PRD hhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccchhh 

MEM 

SEQ LEREQALAKE FAEILHFTLRFDELKMRNPAIQNDFSYYRRTISRNRINNMHLDIENEVNN 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhccchhhhhhhhhhhhhhhh 

MEM 

SEQ EMANRMSLFYAEATPMLKTLSNATMHFVSENKTLPIENTTDCLSTMTSVCKVMLETPEYR 

PRD hhhhhhhhhhhhccchhhhhhhhceeecccccccccccccceeeeehhhhhhhhcccccc 

MEM 

SEQ SRFTSEETLMFCMRVMVGVIILYDHVHPVGAFCKTSKIDMKGCIKVLKEQAPDSVEGLLN 

PRD cccccchhhhhhhhhhhheeeeeeeccccccccccccccchhhhhhhhhccccchhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMM 

SEQ ALRFTTKHLNDESTSKQI RAMLQ 

PRD hhhhhhcccccccchhhhhhccc 

MEM 



Prosite for DKFZphf kd2_24a!5 . 3 



PS00001 
PS00001 
PS00001 
PS00005 
PS00005 
PS00005 
PS00005 



202->206 
211->215 
218->222 
96->99 
138->141 
275->278 
305->308 



ASN_GLYCOSYLATION 
ASN_GLYCOSYLATION 
ASN GLYCOSYLATION 
PKC~PHOSPHO_SITE 
PKC PHOSPHO SITE 
PKC"PHOSPHO~SITE 
PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
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PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 



314->317 
28->32 
105->109 
244->248 
276->280 
231->240 
297->303 



pkc_phospho_site 
ck2 phospho_site 
ck2~phospho_site 
ck2~phospho_site 
ck2_phospho site 
tyr phospho~site 
myristyl 



PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 



(No Pfam data available for DKFZphf kd2_24al5 . 3) 
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DKFZphfkd2_24bl5 



group: metabolism 

DKFZphf kd2_24bl5 encodes a novel 612 amino acid protein with similarity to bacterial and yeast 
phosphoglucomutase and phosphomannomutases . 

The novel protein contains a phosphoserine signature typical for phosphoglucomutase <EC 
5.4.2.2) or phosphomannomutase (EC 5.4.2.8). Thus, the protein seems to be taking part in the 
conversion of hexose phosphates. 

The new protein can find application in modulation of hexose metabolism pathways and as a new 
enzyme for biotechnologic production processes. 



similarity to phosphomannomutases 
complete cDNA, complete cds, EST hits 

potential start at bp 30 matches kozak consensus PyCNatgG, 
Sequenced by GBF 

Locus: map«"158 .8 cR from top of Chr4 linkage group" 
Insert length: 2204 bp 

Poly A stretch at pos. 2186, no polyadenylation signal found 



1 GGGCTCTGCA GCGGTAGCAC AAGCTCAGCG ATGGCGGCTC CAGAAGGCAG 
51 CGGTCTAGGC GAGGACGCCC GGCTGGACCA GGAGACCGCC CAGTGGCTGC 
101 GCTGGGACAA GAATTCCTTA ACTTTGGAGG CAGTGAAACG ACTAATAGCA 
151 GAAGGTAATA AAGAAGAACT ACGAAAATGT TTTGGGGCCC GAATGGAGTT 
201 TGGGACAGCT GGCCTCCGAG CTGCTATGGG ACCTGGAATT TCTCGTATGA 
251 ATGACTTGAC CATCATCCAG ACTACACAGG GATTTTGCAG ATACCTGGAA 
301 AAACAATTCA GTGACTTAAA GCAGAAAGGC ATCGTGATCA GTTTTGACGC 
351 CCGAGCTCAT CCATCCAGTG GGGGTAGCAG CAGAAGGTTT GCCCGACTTG 
401 CTGCAACCAC ATTTATCAGT CAGGGGATTC CTGTGTACCT CTTTTCTGAT 
4 51 ATAACGCCAA CCCCCTTTGT GCCCTTCACA GTATCACATT TGAAACTTTG 
501 TGCTGGAATC ATGATAACTG CATCTCACAA TCCAAAGCAG GATAATGGTT 
551 ATAAGGTCTA TTGGGATAAT GGAGCTCAGA TCATTTCTCC TCACGATAAA 
601 GGGATTTCTC AAGCTATTGA AGAAAATCTA GAACCGTGGC CTCAAGCTTG 
651 GGACGATTCT TTAATTGATA GCAGTCCACT TCTCCACAAT CCGAGTGCTT 
701 CCATCAATAA TGACTACTTT GAAGACCTTA AAAAGTACTG TTTCCACAGG 
751 AGCGTGAACA GGGAGACAAA GGTGAAGTTT GTGCACACCT CTGTCCATGG 
801 GGTGGGTCAT AGCTTTGTGC AGTCAGCTTT CAAGGCTTTT GACCTTGTTC 
851 CTCCTGAGGC TGTTCCTGAA CAGAGAGATC CGGATCCTGA GTTTCCAACA 
901 GTGAAATACC CGAATCCCGA AGAGGGGAAA GGTGTCTTGA CTTTGTCTTT 
951 TGCTTTGGCT GACAAAACCA AGGCCAGAAT TGTTTTAGCT AACGACCCGG 
1001 ATGCTGATAG ACTTGCTGTG GCAGAAAAGC AAGACAGTGG TGAATGGAGG 
1051 GTGTTTTCAG GCAATGAGTT GGGGGCCCTC CTGGGCTGGT GGCTTTTTAC 
1101 ATCTTGGAAA GAGAAGAACC AGGATCGCAG TGCTCTCAAA GACACGTACA 
1151 TGTTGTCCAG CACCGTCTCC TCCAAAATCT TGCGGGCCAT TGCCTTAAAG 
1201 GAAGGTTTTC ATTTTGAGGA AACATTAACT GGCTTTAAGT GGATGGGAAA 
1251 CAGAGCCAAA CAGCTAATAG ACCAGGGGAA AACTGTTTTA TTTGCATTTG 
1301 AAGAAGCTAT TGGATACATG TGCTGCCCTT TTGTTCTGGA CAAAGATGGA 
1351 GTCAGTGCCG CTGTCATAAG TGCAGAGTTG GCTAGCTTCC TAGCAACCAA 
1401 GAATTTGTCT TTGTCTCAGC AACTAAAGGC CATTTATGTG GAGTATGGCT 
14 51 ACCATATTAC TAAAGCTTCC TATTTTATCT GCCATGATCA AGAAACCATT 
1501 AAGAAATTAT TTGAAAACCT CAGAAACTAC GATGGAAAAA ATAATTATCC 
1551 AAAAGCTTGT GGCAAATTTG AAATTTCTGC CATTAGGGAC CTTACAACTG 
1601 GCTATGATGA TAGCCAACCT GATAAAAAAG CTGTTCTTCC CACTAGTAAA 
1651 AGCAGCCAAA TGATCACCTT CACCTTTGCT AATGGAGGCG TGGCCACCAT 
1701 GCGCACCAGT GGGACAGAGC CCAAAATCAA GTACTATGCA GAGCTGTGTG 
1751 CCCCACCTGG GAACAGTGAT CCTGAGCAGC TGAAGAAGGA ACTGAATGAA 
1801 CTGGTCAGTG CTATTGAAGA ACATTTTTTC CAGCCACAGA AGTACAATCT 
1851 GCAGCCAAAA GCAGACTAAA ATAGTCCAGC CTTGGGTATA CTTGCATTTA 
1901 CCTACAATTA AGCTGGGTTT AACTTGTTAA GCAATATTTT TAAGGGCCAA 
1951 ATGATTCAAA ACATCACAGG TATTTATGTG TTTTACAAAG ACCTACATTC 
2001 CTCATTGTTT CATGTTTGAC CTTTAAGGTG AAAAAAGAAA ATGGCCAAAC 
2051 CCAACAAACT AACATTCCTA CTAAAAAGTT GAGCTTGGAC ATATTTTGAA 
2101 TTTTTGTAAG TGAAGATTTT TAAACTGACT AACTTAAAAA AATAGATTGT 
2151 AATTGATGTG CCTTAATTTG CATAAATCAT AAATGTAAAA AAAAAAAAAA 
2201 AAAA 



BLAST Results 



Entry HS70514 5 from database EMBL: 
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human STS Wl-6820. 
Score = 1261, P = 3.6e-52, identities = 253/254 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 31 bp to 1866 bp; peptide length: 612 
Category: strong similarity to known protein 



1 MAAPEGSGLG EDARLDQETA QWLRWDKNSL TLEAVKRLIA EGNKEELRKC 
51 FGARMEFGTA GLRAAMGPGI SRMNDLTIIQ TTQGFCRYLE KQFSDLKQKG 
101 IVISFDARAH PSSGGSSRRF ARLAATTFIS QGIPVYLFSD ITPTPFVPFT 
151 VSHLKLCAGI MITASHNPKQ DNGYKVYWDN GAQIISPHDK GISQAIEENL 
201 EPWPQAWDDS LIDSSPLLHN PSASINNDYF EDLKKYCFHR SVNRETKVKF 
251 VHTSVHGVGH SFVQSAFKAF DLVPPEAVPE QRDPDPEFPT VKYPNPEEGK 
301 GVLTLSFALA DKTKARIVLA NDPDADRLAV AEKQDSGEWR VFSGNELGAL 
351 LGWWLFTSWK EKNQDRSALK DTYMLSSTVS SKILRAIALK EGFHFEETLT 
401 GFKWMGNRAK QLIDQGKTVL FAFEEAIGYM CCPFVLDKDG VSAAVISAEL 
4 51 ASFLATKNLS LSQQLKAIYV EYGYHITKAS YFICHDQETI KKLFENLRNY 
501 DGKNNYPKAC GKFEISAIRD LTTGYDDSQP DKKAVLPTSK SSQMITFTFA 
551 NGGVATMRTS GTEPKIKYYA ELCAPPGNSD PEQLKKELNE LVSAIEEHFF 
601 QPQKYNLQPK AD 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_24bl5, frame 1 

TREMBL :CEY43F4B_5 gene: "Y43F4B.5"; Caenorhabditis elegans cosmid 
Y43F4B, N = 1, Score - 1431, P = 1.6e-146 

TREMBL:SPCC1840_5 gene: "SPCC1840 .05c"; product: "similarity to 
phosphomannomutases"; S.pombe chromosome III cosmid cl840., N - 1, 
Score = 1210, P = 4.2e-123 

PIR:S54585 hypothetical protein YMR278w - yeast (Saccharomyces 
cerevisiae), N = 1, Score - 1046, P = le-105 

PIR:A71299 probable phosphomannomutase (manB) - syphilis spirochete, N 
= 1, Score = 697, P = 9.7e-69 



>TREMBL : CEY 4 3 F4 B_5 gene: "Y43F4B.5"; Caenorhabditis elegans cosmid Y43F4B 
Length « 595 

HSPs: 

Score = 1431 (214.7 bits), Expect = 1.6e-146, P = 1.6e-146 
Identities « 285/598 (47%), Positives = 393/598 (65%) 

Query: 13 ARLDQETAQWLRWDKNSLTLEAVKRLIAEGNKEELRKCFGARMEFGTAGLRAAMGPGISR 72 

A+LD++ A WL WDKN +++L+ E N + L+ R+ FGTAG+R+ M G R 

Sbjct: 6 AKLDKQVADWLAWDKNDKNRNEIQKLVDEKNVDALKARMDTRLVFGTAGVRSPMQAGFGR 65 

Query: 73 MNDLTIIQTTQGFCRYLEKQFSDLKQKGIVISFDARAHPSSGGSSRRFARLAATTFISQG 132 

+NDLTIIQ T GF R++ + K G+ I FD R + SRRFA L+A F+ 

Sbjct: 66 LNDLTIIQITHGFARHMLNVYGQPKN-GVAIGFDGRYN SRRFAELSANVFVRNN 118 

Query: 133 IPVYLFSDITPTPFVPFTVSHLKLCAGIMITASHNPKQDNGYKVYWDNGAQIISPHDKGI 192 

IPVYLFS+++PTP V + L AG++ITASHNPK+DNGYK YW NGAQII PHD I 
Sbjct: 119 IPVYLFSEVSPTPVVSWATIKLGCDAGLIITASHNPKEDNGYKAYWSNGAQIIGPHDTEI 178 

Query: 193 SQAIEENLEPWPQAWDDSLIDSSPLLHNPSASINNDYFEDLKKYCFHRSVNRETKVKFVH 252 

+ E +P + WD S + SSPL H+ 1+ YFE K F R +N T +KF + 
Sbjct: 179 VRIKEAEPQPRDEYWDLSELKSSPLFHSADVVID-PYFEVEKSLNFTREINGSTPLKFTY 237 

Query: 253 TSVHGVGHSFVQSAFKAFDLVPPE— AVPEQRDPDPEFPTVKYPNPEEGKGVLTLSFALA 310 

++ HG+G+ + + F F +V EQ+DP+P+FPT+ +PNPEEG+ VLTL+ A 

Sbjct: 238 SAFHGIGYHYTKRMFAEFGFPASSFISVAEQQDPNPDFPTIPFPNPEEGRKVLTLAMETA 297 
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Query: 311 DKTKARIVIANDPDADRLAVAEKQDSGEWRVFSGNELGALLGWWLFTSWKEKNQDRSALK 370 

DK + ++LANDPDADR+ +AEKQ GEWRVF+GNE+GAL+ WW++T+W++ N + A K 
Sbjct: 298 DKNGSTVILANDPDADRIQMAEKQKDGEWRVFTGNEMGALITWWIWTNWRKANPNADASK 357 

Query: 371 DTYMLSSTVSSKILRAIALKEGFHFEETLTGFKWMGNRAKQLIDQGKTVLFAFEEAIGYM 4 30 

Y+L+S VSS+I++ IA EGF E TLTGFKWMGNRA++L G V+ A+EE+IGYM 
Sbjct: 358 -VYILNSAVSSQIVKTIADAEGFKNETTLTGFKWMGNRAEELRADGNQVILAWEESIGYM 416 

Query: 431 CCP-FVLDKDGVSAAVISAELASFLATKNLSLSQQLKAIYVEYGYHITKASYFICHDQET 489 

P +DKDGVSAA + AE+A+FL + SL QL A+Y YG+H+ +++Y++ E 
Sbjct: 417 --PGHTMDKDGVSAAAVFAEIAAFLHABGKSLQDQLYALYNRYGFHLVRSTYWMVPAPEV 474 

Query: 490 IKKLFENLRNYDGKNNYPKACGKFEISAI RDLTTGYDDSQPDKKAVLPTSKSSQMITFTF 549 

KKLF LR D K +P G+ E++++RDLT GYD+S+PD K VLP S SS+M+TF 
Sbjct: 475 TKKLFSTLRA-DLK--FPTKIGEAEVASVRDLTIGYDNSKPDNKPVLPLSTSSEMVTFFL 531 

Query: 550 ANGGVATMRTSGTEPKIKYYAELCAPPGNS — DPEQLKKELNELVSAIEEHFFQPQKYNL 607 

G V T+R SGTEPKIKYY EL PG + D E + E+++L + + PQ-M- L 

Sbjct: 532 KTGSVTTLRASGTEPKIKYYIELITAPGKTQNDLESVISEMDQLEKDWATLLRPQQFGL 591 

Query: 608 QPK 610 
P+ 

Sbjct: 592 IPR 594 



Pedant information for DKFZphf kd2_24bl5, frame 1 



Report for DKFZphf kd2_24bl5 . 1 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 

[FUNCAT] 

[FUNCAT] 

t FUNCAT J 

[FUNCAT] 

[BLOCKS] 

(BLOCKS] 

[EC] 

[EC] 

[PIRKWJ 

[PIRKW] 

(SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAM] 

[KW] 



612 

68311.58 
6.28 

TREMBL : CEY43F4B_5 gene: 



"Y43F4B.5"; Caenorhabditis elegans cosmid Y4 3F4B le-157 



01.05.01 carbohydrate utilization [S. cerevisiae, YMR278w] le-111 

g carbohydrate metabolism and transport [H. influenzae, HI0740] 3e-66 

c energy conversion [M. genitalium, MG053] 4e-50 
m outer membrane and cell wall [H. influenzae, HI1463] 2e-04 

BL00607D cAMP phosphodiesterases class-II proteins 

BL00710 Phosphoglucomutase and phosphomannomutase phosphoserine signa 
5.4.2.8 Phosphomannomutase 3e-56 

5.4.2.2 Phosphoglucomutase le-09 
isomerase 3e-56 

intramolecular transferase 3e-56 

Methanobacterium thermoautotrophicum phosphomannomutase le-06 
probable phosphorylating protein ureC 9e-06 
PGM_PMM 1 

MYRISTYL 10 
LIPOCALIN 2 
CK2_PHOSPHO_SITE 9 
GLYCOSAMINOGLYCAN 1 
PKC_PHOSPHO_SITE 8 
AS N_G L YCOS Y LAT I ON 1 

Phosphoglucomutase and phosphomannomutase phosphoserine 
Alpha_Beta 



SEQ MAAPEGSGLGEDARLDQETAQWLRWDKNSLTLEAVKRLIAEGNKEELRKCFGARMEFGTA 

PRD ccccccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhcchhhhhhhhhhhhccccc 

SEQ GLRAAMGPGISRMNDLTIIQTTQGFCRYLEKQFSDLKQKGIVISFDARAHPSSGGSSRRF 

PRD cccccccccccccceeeeeehhhhhhhhhhhhcccccceeeeeecccccccccccchhhh 

SEQ ARLAATTFISQGIPVYLFSDITPTPFVPFTVSHLKLCAGIMITASHNPKQDNGYKVYWDN 

PRD hhhhhhhhhhccceeeeeccccccccchhhhhhhcccceeeeeeccccccccceeeeecc 

SEQ GAQIISPHDKGISQAIEENLEPWPQAWDDSLIDSSPLLHNPSASINNDYFEDLKKYCFHR 

PRD ccccccccchhhhhhhhhhhhhhcccccccccccccccccccchhhhhhhhhhhhhhhcc 

SEQ SVNRETKVKFVHTSVHGVGHSFVQSAFKAFDLVPPEAVPEQRDPDPEFPTVKYPNPEEGK 

PRD ccccccceeeeeeeccccccchhhhhhhhhcccccccccccccccccccccccccccchh 

SEQ GVLTLSFALADKTKARIVLANDPDADRLAVAEKQDSGEWRVFSGNELGALLGWWLFTSWK 

PRD hhhhhhhhhhhhhcceeeeeccccccceeeeecccccceeeecccchhhhhhhhhhhhhh 

SEQ EKNQDRSALKDTYMLSSTVSSKILRAIALKEGFHFEETLTGFKWMGNRAKQLIDQGKTVL 

PRD hcccccccccceeeeeeeehhhhhhhhhhhcccceeeeeccccchhhhhhhhhhccceee 
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SEQ FAFEEAIGYMCCPFVLDKDGVSAAVISAELASFLATKNLSLSQQLKAI YVEYGYHITKAS 

PRD hhhhhccccccccccccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhcccccccc 

SEQ YFICHDQETIKKLFENLRNYDGKNNYPKACGKFEISAIRDLTTGYDDSQPDKKAVLPTSK 

PRD eeeccchhhhhhhhhhhhhhhcccccccccchhhhhhhcccccccccccccccccccccc 

SEQ SSQMITFTFANGGVATMRTSGTEPKIKYYAELCAPPGNSDPEQLKKELNELVSAIEEHFF 

PRD ccceeeeeecccceeeeecccccccceeeeeeccccccchhhhhhhhhhhhhhhhhhhhh 



SEQ 
PRD 



QPQKYNLQPKAD 
cccccccccccc 
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PS00001 


458->462 


ASN GLYCOSYLATION 


PDOC00001 


PS00002 


1 


'->11 


GL YCOSAMI NOGLYCAN 


PDOC00002 


PS00005 


116- 


>119 


PKC_PHOSPHO_ 


SITE 


PDOC00005 


PS00005 


117- 


■>120 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


290- 


->293 


PKC~PHOSPHO^ 


"site 


PDOC00005 


PS00005 


358- 


■>361 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


380- 


■>383 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


489- 


>492 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


538- 


•>541 


PKC PHOSPHO - 


"site 


PDOC00005 


PS00005 


556- 


>559 


PKC~PHOSPHO" 


"site 


PDOC00005 


PS00006 


186- 


■>190 


CK2 PHOSPHO - 


"site 


PDOC00006 


PS00006 


210- 


->214 


CK2 PHOSPHO - 


"site 


PDOC00006 


PS00006 


343- 


■>347 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


358- 


>362 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


523- 


>527 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


528->532 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


560- 


■>564 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


579- 


•>583 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


593->597 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


6->12 


MYRISTYL 




PDOC00008 


PS00008 


61->67 


MYRISTYL 




PDOC00008 


PS00008 


100- 


■>106 


MYRISTYL 




PDOC00008 


PS00008 


159- 


■>165 


MYRISTYL 




PDOC00008 


PS00008 


191- 


■>197 


MYRISTYL 




PDOC00008 


PS00008 


257- 


■>263 


MYRISTYL 




PDOC00008 


PS00008 


344- 


->350 


MYRISTYL 




PDOC00008 


PS00008 


348- 


•>354 


MYRISTYL 




PDOC00008 


PS00008 


440- 


•>446 


MYRISTYL 




PDOC00008 


PS00008 


552- 


->558 


MYRISTYL 




PDOC00008 


PS00710 


159- 


■>174 


PGM PMM 




PDOC00589 


PS00213 


346- 


•>358 


LIPOCALIN 




PDOC00187 


PS00213 


344- 


•>358 


LIPOCALIN 




PDOC00187 



Pfam for DKFZphf kd2_24bl5 . 1 



HMM_NAME Phosphoglucomutase and phosphomannomutase phosphoserine 

HMM *GvnVIdIGQNGMMPTPMIYFaIRTYKhmcmggGIMITaSHNPGGPDnDN 
G+ V + ++PTP + F + H+++ +GIMITASHNP DN 
Query 132 GIPVYLFS--DITPTPFVPFTVS HLKLCAGIMITASHNP — KQ-DN 

HMM GIK* 
G+K 

Query 173 GYK 175 



172 
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DKFZphfkd2_24e23 



group: kidney derived 

DKFZphfkd2_24e23 encodes a novel 198 amino acid protein without similarity to 
known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of 
kidney-specific genes. 



unknown 

complete cDNA, complete cds, 1 EST hit, 
many ATGs in front of the ORF 

Sequenced by GBF 

Locus : unknown 

Insert length: 1723 bp 

Poly A stretch at pos. 1695, no polyadenylation signal found 



1 GGGGGATTTT CGATCATGAC AACGATAGCA AT TG AT AT AC CTTCAAAATA 

51 CGTGTCCAGT GAGTGTTGAT TGTGTGTGGT TTCTCTAGGA GACCGTGTTC 

101 ATGCAACACA GCATTATTTC ACCGCCTTTA CCCCAGCTTC TTCATACACA 

151 TGCACTTGTC AAGGGCTCTT TGGCTGAAGA GAAGTTAGAA GTTTCCAGAT 

201 ATGGAGGGGT ATTTTCAGCA GATATGCCCA CCGCCATGGT TTTGTCAGCT 

251 CTGTAGGGTG GTCTTGCACC CTGCTCACTG CTGGCATCAC CTGAGCCTAT 

301 GGCAGATACC CAGTGCTGCC CGCCACCATG TGAATTCATC AGCTCTGCAG 

351 GCACAGACCT TGCACTAGGA ATGGGCTGGG ACGCCACCCT CTGCCTCTTA 

401 CCATTCACTG GGTTTGGCAA GTGTGCTGGG ATCTGGAATC ACATGGATGA 

451 GGAACCCGAT AATGGTGACG ACCGAGGTAG CAGGCGAACC ACTGGCCAGG 

501 GCAGGAAGTG GGCAGCTCAC GGGACTATGG CTGCACCGCG GGTTCATACC 

551 GACTACCATC CTGGAGGTGG GAGCGCATGC TCATCTGTAA AAGTCCGGTC 

601 CCACGTTGGA CACACCGGGG TCTTCTTCTT TGTTGACCAG GATCCTCTGG 

651 CAGTGTCTTT AACAAGCCAG AGTCTGATCC CACCGCTCAT AAAGCCAGGG 

701 TTGTTGAAAG CTTGGGGCTT CCTCCTCCTC TGTGCGCAGC CCTCAGCAAA 

751 CGGTCACAGC CTGTGCTGTC TGCTGTACAC CGACTTGGTA TCATCCCATG 

801 AACTGTCCCC CTTTCGTGCT CTGTGCTTAG GGCCCTCTGA TGCCCCATCT 

851 GCCTGCGCTT CCTGCAACTG TTTAGCAAGC ACCTATTATC TATAGGGTGC 

901 TGGGGTGCTG GGCGAGGCCA ATCGCTCCTA TTACTTTCTG CCCTGGGGAC 

951 GTCCTGTTTT CCCACCTACC CCTGTAACGC CTCTGCTCTG CCTTCCCATC 

1001 TGCGGGGCTA ACGCCATCCC ACAAGGGCTG GGCTGTCCGT TCAGAAGAGA 

1051 AACTGGGAAG GGGCCTTGAG GACCTGTGTC CAGGCAGGGT GGACAAGGGC 

1101 TTTGTGCAGG GAGCTCCTCT CCCATCTTTG TGTCCTGACA GCCGTGACCG 

1151 TGACCCCTCA AAGCAGAGCC AGTAGTGATC AGTATCCTGC TGCTTCAAGC 

1201 CTGCACGGTC CTCTTCTCCT CTCCGCACAT CTGCATGCCT GTCAAACCCA 

1251 GAGTAGTTTG GGGCCTGGTA AACAGAGGGA AGTTGGCTGG AGGAGGCCAG 

1301 TCAGGAGTGC AAGAACCCCG CGTACTCTGT CCCACGTGGA TAAAGTCTCT 

1351 AATTCCAGTC TGAGGTGAAT TCTTAGAGAG TGCTTTCATT TAATGTTTGC 

1401 TTTATGCATT TCCCCTGCAG CTGTGACTAA TTGTGGAACA GCATACATTT 

1451 TGTTTTGAGA CTCTCTTGAG ATTTTTCTGG CAGTGTAAGG TCTACACCAT 

1501 TTTCCTCTCA GCATCAGAGA AGGCAGAAAG CAAGAGAAAG GAATGCAATG 

1551 TGAGCAAGGC CAGGCACACT TGTGCTACTG CAGTTGGCAA GAATGGAGTC 

1601 TAATCCCAGC ACTTTGGGAG GCCGAGGCGG GTGGATCACC TGAGGTCAGG 

1651 AATTTGAGAC CAACCTGGCC AACATGTTGA AACCTCGTCT GTACTAAAAA 

1701 TACAAAAAAA AAAAAAAAAA AAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 
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ORF from 299 bp to B92 bp; peptide length: 198 
Category: putative protein 



1 MADTQCCPPP CEFISSAGTD LALGMGWDAT LCLLPFTGFG KCAGIWNHMD 
51 EEPDNGDDRG SRRTTGQGRK WAAHGTMAAP RVHTDYHPGG GSACSSVKVR 
101 SHVGHTGVFF FVDQDPLAVS LTSQSLIPPL IKPGLLKAWG FLLLCAQPSA 
151 NGHSLCCLLY TDLVSSHELS PFRALCLGPS DAPSACASCN CLASTYYL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_24e23, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_24e23, frame 2 



Report for DKFZphf kd2_24e23. 2 



[LENGTH) 


198 




[MW] 


20948.98 




tpU 


6.01 




[PROSITE] 


MYRISTYL 5 




[PROSITE] 


AMI DAT I ON 1 




[PROSITE] 


CAMP PHOSPHO SITE 


1 


[PROSITE J 


CK2 PHOSPHO SITE 


1 


(PROSITE] 


PKC PHOSPHO SITE 


2 


[KW] 


All Beta 




[KW] 


LOW COMPLEXITY 


6.06 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MADTQCCPPPCEFISSAGTDLALGMGWDATLCLLPFTGFGKCAGIWNHMDEEPDNGDDRG 

ccccccccccccccccccccccccccccceeeeeccccccceeeeccccccccccccccc 

SRRTTGQGRKWAAHGTMAAPRVHTDYHPGGGSACSSVKVRSHVGHTGVFFFVDQDPLAVS 

cccccccccccccccccccceeeeecccccccccceeeeeeeccccceeeeeccccceee 

LTSQSLIPPLIKPGLLKAWGFLLLCAQPSANGHSLCCLLYTDLVSSHELS PFRALCLGPS 

xxxxxxxxxxxx 

eccccccccccccchhhhhhhhhhhccccccccceeeeeeeeeccccccccceeeecccc 

DAPSACASCNCLASTYYL 

cccccccccccccccccc 
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PS00004 


62->66 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


61->64 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


96->99 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00006 


165->169 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


18->24 


MYRISTYL 




PDOC00008 


PS00008 


60->66 


MYRISTYL 




PDOC00008 


PS00008 


89->95 


MYRISTYL 




PDOC00008 


PS00008 


91->97 


MYRISTYL 




PDOC00008 


PS00008 


134->140 


MYRISTYL 




PDOC00008 


PS00009 


67->71 


AMIDATION 




PDOC00009 



(No Pfam data available for DKFZphf kd2 24e23. 2) 
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DKFZphfkd2_24n20 



group: intracellular transport and trafficking 

DKFZphfkd2_24n20.3 encodes a novel 366 amino acid protein with similarity to human eps8 
binding protein e3Bl and spectrins. 

The new protein contains an Src homology domain 3 and is similar to human eps8 SH3 domain 
binding protein 1 (e3Bl) and spectrins. Eps8 is a substrate of receptor tyrosine kinases 
involved in mitogenic signaling. Spectrin is part of the submembrane cytoskeletal network in 
the human erythrocyte ghost. Nonerythroid spectrins are proposed to have roles in cell 
adhesion, establishment of cell polarity, and attachment of other cytoskeletal structures to 
the plasma membrane. The new protein seems to be part of the signalling pathway between 
tyrosine kinases and the membrane/cyto skeleton. 

The new protein can find application in modulating cell adhesion/motility and membrane/cyto 
skeleton structure and dynamics. 



strong similarity to eps8 binding protein e3Bl 
complete cDNA, complete cds, few EST hits 

potential start at Bp 300, but there are ATGs in other frames in 
5' region of the cDNA 

Sequenced by GBF 

Locus: /map="17" 

Insert length: 1719 bp 

Poly A stretch at pos. 1699, polyadenylation signal at pos. 1680 



1 GGGGACAGCT GCCCCGACCT TGGCTTCCTC TGCTGGGTGG GATTGGGGGC 
51 TGGGCCCCCA AATGGGCCCC TGGCTTCCCC CTTCCTCTGG GCAGGGGACA 
101 GAGAGACACA GGCTCGGGGA GCAGGACTGA CTTCCTCTTG TCCCGGAATG 
151 AGCATGCCTG CCCTTTGCAA GCAGGTTTGG GTCTCACGCA GAGGAAACCA 
201 AAAGCAATAA GAGGGAGGGA AGGCAGAGCA ACCAATCAAG GGCAGGGTGA 
251 GACTCAAAAC GAGCGGGCTC CCTGGGGAGC CAGACAGAGG CTGGGGGTGA 
301 TGGCGGAGCT ACAGCAGCTG CAGGAGTTTG AGATCCCCAC TGGCCGGGAG 
351 GCTCTGAGGG GCAACCACAG TGCCCTGCTG CGGGTCGCTG ACTACTGCGA 
401 GGACAACTAT GTGCAGGCCA CAGACAAGCA GAAGGCGCTG GAGGAGACCA 
451 TGGCCTTCAC TACCCAGGCA CTGGCCAGCG TGGCCTACCA GGTGGGCAAC 
501 CTGGCCGGGC ACACTCTGCG CATGTTGGAC CTGCAGGGGG CCGCCCTGCG 
551 GCAGGTGGAA GCCCGTGTAA GCACGCTGGG CCAGATGGTG AACATGCATA 
601 TGGAGAAGGT GGCCCGAAGG GAGATCGGCA CCTTAGCCAC TGTCCAGCGG 
651 CTGCCCCCCG GCCAGAAGGT CATCGCCCCA GAGAACCTAC CCCCTCTCAC 
701 GCCCTACTGC AGGAGACCCC TCAACTTTGG CTGCCTGGAC GACATTGGCC 
751 ATGGGATCAA GGACCTCAGC ACGCAGCTGT CAAGAACAGG CACCCTGTCT 
801 CGAAAGAGCA TCAAGGCCCC TGCCACACCC GCCTCCGCCA CCTTGGGGAG 
851 ACCGCCCCGG ATTCCCGAGC CAGTGCACCT GCCGGTGGTG CCCGACGGCA 
901 GACTCTCCGC CGCCTCCTCT GCGTCTTCCC TGGCCTCGGC CGGCAGCGCC 
951 GAAGGTGTCG GTGGGGCCCC CACGCCCAAG GGGCAGGCAG CACCTCCAGC 
1001 CCCACCTCTC CCCAGCTCCT TGGACCCACC TCCTCCACCA GCAGCCGTCG 
1051 AGGTGTTCCA GCGGCCTCCC ACGCTGGAGG AGTTGTCCCC ACCCCCACCG 
1101 GACGAAGAGC TGCCCCTGCC ACTGGACCTG CCTCCTCCTC CACCCCTGGA 
1151 TGGAGATGAA TTGGGGCTGC CTCCACCCCC ACCAGGATTT GGGCCTGATG 
1201 AGCCCAGCTG GGTGCCTGCC TCATACTTGG AGAAAGTGGT GACACTGTAC 
1251 CCATACACCA GCCAGAAGGA CAATGAGCTC TCCTTCTCTG AGGGCACTGT 
1301 CATCTGTGTC ACTCGCCGCT ACTCCGATGG CTGGTGCGAG GGCGTCAGCT 
1351 CGGAGGGGAC TGGATTCTTC CCTGGGAACT ATGTGGAGCC CAGCTGCTGA 
1401 CAGCCCAGGG CTCTCTGGGC AGCTGATGTC TGCACTGAGT GGGTTTCATG 
14 51 AGCCCCAAGC CAAAACCAGC TCCAGTCACA GCTGGACTGG GTCTGCCCAC 
1501 CTCTTGGGCT GTGAGCTGTG TTCTGTCCTT CCTCCCATCG GAGGGAGAAG 
1551 GGGTCCTGGG GAGAGAGAAT TTATCCAGAG GCCTGCTGCA GATGGGGAAG 
1601 AGCTGGAAAC CAAGAAGTTT GTCAACAGAG GACCCCTACT CCATGCAGGA 
1651 CAGGGTCTCC TGCTGCAAGT CCCAACTTTG AATAAAACAG ATGATGTCCA 
1701 AAAAAAAAAA AAAAAAAAA 



BLAST Results 



Entry AC004797 from database EMBL: 

Homo sapiens chromosome 17, clone hRPC.62_0_9, complete sequence. 
Score - 2316, P « 5.9e-255, identities - 464/465 
7 exons Bp 93317-110902 
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Medline entries 



97163405: 

Isolation and characterization of e3Bl, an eps8 binding 
protein that regulates cell growth. 

98256293: 

Identification of a candidate human spectrin Src homology 3 
domain-binding protein suggests a general mechanism of 
association of tyrosine kinases with the spectrin-based 
membrane skeleton. 



Peptide information for frame 3 



ORF from 300 bp to 1397 bp; peptide length: 366 
Category: strong similarity to known protein 



1 MAELQQLQEF EIPTGREALR GNHSALLRVA DYCEDNYVQA TDKQKALEET 
51 MAFTTQALAS VAYQVGNLAG HTLRMLDLQG AALRQVEARV STLGQMVNMH 
101 MEKVARREIG TLATVQRLPP GQKVIAPENL PPLTPYCRRP LNFGCLDDIG 
151 HG1KDLSTQL SRTGTLSRKS I KA PAT PAS A TLGRPPRIPE PVHLPVVPDG 
201 RLSAASSASS LASAGSAEGV GGAPTPKGQA APPAPPLPSS LDPPPPPAAV 
251 EVFQRPPTLE ELSPPPPDEE LPLPLDLPPP PPLDGDELGL PPPPPGFGPD 
301 EPSWVPASYL EKWTLYPYT SQKDNELSFS EGTVICVTRR YSDGWCEGVS 
351 SEGTGFFPGN YVEPSC 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_24n20, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_24n20, frame 3 



Report for DKFZphf kd2_24n20. 3 



( LENGTH ] 366 

IMW] 38947.21 

[pi] 4.93 

[HOMOL] TREMBL:U87166_1 gene: 



"SSH3BP1 W ; product: "spectrin SH3 domain binding protein 



1"; Homo sapiens spectrin SH3 domain binding protein 1 (SSH3BP1) mRNA, complete cds. 3e-48 

[ FUNCAT ] 10.99 other signal-transduction activities (S. cerevisiae, YGR136w] 9e-06 

[ FUNCAT ] 30.10 nuclear organization [S. cerevisiae, YGR136w] 9e-06 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YPR154w] 3e-05 

[FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YDR388w] 2e-04 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YDR388w] 
2e-04 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YDR162c] 4e-04 

[BLOCKS] BL50002B Src homology 3 (SH3) domain proteins profile 

[SUPFAM] SH3 homology 6e-17 

[PROSITE] MYRISTYL 6 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 6 

[PROSITE] PKC PHOSPHO_SITE 8 

[PROSITE] ASN~GLYCOSYLATION 1 

[PFAM] Src homology domain 3 

[KW] Irregular 

[KW] 3D 

[KW] LOW COMPLEXITY 24.04 % 



SEQ MAELQQLQEFEIPTGREALRGNHSALLRVADYCEDNYVQATDKQKALEETMAFTTQALAS 

SEG 

laboA 

SEQ VAYQVGNLAGHTLRMLDLQGAALRQVEARVSTLGQMVNMHMEKVARREIGTLATVQRLPP 

SEG 

laboA 
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SEQ GQKVIAPENLPPLTPYCRRPLNFGCLDDIGHGIKDLSTQLSRTGTLSRKSIKAPATPASA 

SEG 

laboA 

SEQ TLGRPPRI PEPVHLPVVPDGRLSAASSASSLAS AGS AEGVGGAPTPKGQAAPPAPPLPSS 

SEG xxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxx 

laboA 

SEQ LDPPPPPAAVEVFQRPPTLEELSPPPPDEELPLPLDLPPPPPLDGDELGLPPPPPGFGPD 

SEG xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

laboA 

SEQ EPSWVPASYLEKVVTLYPYTSQKDNELSFSEGTVICVTRRYSDGWCEGVSSEGTGFFPGN 

SEG xx 

laboA EECCCBCCCTTTBCCBTTTEEEEEEEETTTTEEEEEETTEEEEEEGG 

SEQ YVEPSC 

SEG 

laboA GEEE. . 
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PS00001 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 



22->26 
339->343 
14-M7 
41->44 
72->75 
167->170 
170->173 
225->228 
321->324 
338->341 
14->1B 
239->243 
258->262 
308->312 
321->325 
328->332 
21->27 
66->72 
94->100 
110->116 
215->221 
332->338 



ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

PKCf PHOSPHO_SITE 

PKC_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2 PHOSPHORS ITE 

CK2~PHOSPHcTsiTE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
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HMM_NAME 

HMM 

Query 

HMM 

Query 



Src homology domain 3 

* py VI ALYDYqAqdpDELS FkEGDI IillEdsDD. WWrgRnnnTNGQEGW 
++V+ LY+Y++Q ++ELSF EG +1 + + D W++G + +G+ 
311 EKVVTLYPYTSQKDNELSFSEGTVICVTRRYSDGWCEGVSSE GTGF 



356 



IPSNYVEPX* 
+P NYVEP 
357 FPGNYVEPS 



365 
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DKFZphfkd2_24p5 



group: intracellular transport and trafficking 

DKFZphf kd2_24p5 encodes a novel 811 amino acid protein which is a novel splice variant of 
human ankyrin G. 

The ankyrin 3 gene encodes a novel ankyrin, which is expressed in multiple tissues, with very 
high expression at the axonal initial segment and nodes of Ranvier of neurons in the central 
and peripheral nervous systems. Ankyrin G shows several tissue-specific alternative mRNA 
processing. The different ankyrin G proteins participate in maintenance/targeting of ion 
channels and cell adhesion molecules to nodes of Ranvier and axonal initial segments. 

The new protein can find application in modulating the structure and membrane topology of 
Ranvier nodes and other neuronal cell membranes. 



Human ankyrin G (ANK-3) new splice variant 
splice variant 

potential frame shift at 2720 was checked 
see BLASTX 

Sequenced by EMBL 

Locus: /map="10q21 w 

Insert length: 3470 bp 

Poly A stretch at pos. 3459, no polyadenylation signal found 



1 AGCTTTAAAA GGATGTCTGC 
51 AAGTGGGGTT TTTTAAAAAG 
101 CTTCGAACTC TGAGTGGGGA 
151 GATATCAAGA TGCTGAGAGG 
201 AACCAGGGAT TGGTTTCCTT 
251 TGCTTAAGGA ATTCCTGGAA 
301 TAGTGGAAAT GGCTTTACCG 
351 GACAAATATC TTGGGCCACA 
401 GCCTGCAGAG GGTTACATGG 
451 TCCGCTCCTT CAGTTCGGAT 
501 GCACGGGACA GCATGATGAT 
551 GCATCTAACA TTCACAAGGG 
601 GCTGGGCTGC AGACACCTTA 
651 CATTCTGGGT TTCTGGTTAG 
701 GAGAGGAAGC CGTCATCACG 
751 GTACGGCCCC CACTCGAATC 
801 GCCAACCCAC CCCCCATGGT 
851 AGAAATGGGT CCTGCAGGGG 
901 TCCCTCACTT TGGGTCCATG 
951 CGAAGTGAAA ATGGTGAAAC 
1001 TGAAGATTTA ACCGAGTTAC 
1051 CAGAAGAGTT AGGGAAAAAG 
1101 CCCCAGTATT TTGCAGTGGT 
1151 TGGTCCTGAA GGTGGAATTC 
1201 CATCTTTCCC AGAGGGTGCC 
1251 GCCCAGCCTG TTCCAGATGA 
1301 AACTTTTAGC CCAATTGTCA 
1351 AACCAATCAC AATGACCATT 
1401 TCCAATGGAT ACAAAGGGGA 
1451 CATTACAGGG GGCACTTCGC 
1501 CTCCTTTGAC GTTTATAAAA 
1551 GCCAGATTTT GGCTTGCAGA 
1601 AGCCACGCAA CTGTACAGAG 
1651 TTGTTGTTTT TGCCAAAATG 
1701 TTCTGCATGA CAGATGACAA 
1751 TTTTGAGGAA GTCGCAAGAA 
1801 CTATTTATGT TGATTGTTAT 
1851 CAGCAACTTG TTTTTAACTT 
1901 TTCCATCAAG ATTAGAGACA 
1951 TTCTGAAAGA ACCAAAGACA 
2001 AACTTAAATA TCACTCTGCC 
2051 ACGACAGAGC TTCGCATCCT 
2101 CTGAGCCTGG AATGAGTCCA 
2151 ATGGCAATAG TAGCCGATCA 
2201 GGAACTGAAT TTTTCAGTGG 
2251 CAAATTCTTT AATTTCTCAG 
2301 AGAGACGGAA AAAATGCCAC 
2351 AATTAATCGA ATAGATATAG 



GAAGTGGTCA AAAGGATCTT AACCTCAATT 
ATTTTTTGGG GGGCCTGAAA TTTTGAAAAT 
AAGATGTATA ATTCCTCAAT TGCCTACGAG 
AATTCAGCGG TGGTGAAGAG AGTGGATACA 
GAGCTGTTTT GGAGGTTGAT TCTAAATCAC 
ACATCAGGAA AACATTTGAT CATCCAAGCC 
CAGAGTGAAG ATGCAATGAC CGGGGACACA 
GGACCTTAAG GAATTGGGTG ATGATTCCCT 
GCTTTAGTCT CGGAGCGCGT TCTGCCAGCC 
GGGTCTTACA CCTTGAACAG AAGCTCCTAT 
TGAAGAACTC CTCGTGCCAT CCAAAGAGCA 
AATTTGATTC AGATTCTCTT AGACATTACA 
GACAATGTCA ATCTTGTTCC AAGCCCCATT 
CTTTATGGTG GACGCGAGAG GGGGCTCCAT 
GGATGAGAAT CATCATTCCT CCACGCAAGT 
ACCTGCCGTT TGGTAAAGAG ACATAAACTG 
GGAAGGAGAG GGATTAGCCA GTAGGCTGGT 
CACAATTTTT AGGCCCTGTC ATAGTGGAAA 
AGAGGAAAAG AGAGAGAACT CATTGTTCTT 
TTGGAAGGAG CATCAGTTTG ACAGCAAAAA 
TTAATGGCAT GGATGAAGAA CTTGATAGCC 
CGTATCTGCA GGATTATCAC GAAAGATTTC 
TTCCCGGATT AAGCAGGAAA GCAACCAGAT 
TGAGCAGCAC CACAGTGCCC CTTGTTCAAG 
CTAACTAAAA GAATTCGAGT GGGCCTCCAG 
AATTGTGAAA AAGATCCTTG GAAACAAAGC 
CTGTGGAACC AAGAAGACGG AAATTCCATA 
CCGGTGCCCC CGCCCTCAGG AGAAGGTGTA 
CACTACACCC AATCTGCGTC TTCTCTGTAG 
CTGCTCAGTG GGAAGACATC ACAGGAACAA 
GATTGTGTCT CCTTTACAAC CAATGTTTCA 
CTGCCATCAA GTTTTAGAAA CTGTGGGGTT 
AATTGATATG TGTTCCATAT ATGGCCAAGT 
AATGATCCCG TAGAATCTTC CTTGCGATGT 
AGTGGACAAA ACTTTAGAGC AACAAGAGAA 
GCAAAGATAT TGAGGTTCTG GAAGGAAAAC 
GGAAATTTGG CCCCACTTAC CAAAGGAGGA 
TTATTCTTTC AAAGAAAATA GACTGCCATT 
CCAGCCAAGA GCCCTGTGGT CGTCTGTCTT 
ACAAAAGGAC TGCCTCAAAC AGCGGTTTGC 
AGCACATAAA AAGATTGAGA AAACAGATGG 
TAGCTTTACG TAAGCGCTAC AGCTACTTGA 
CAGAGTCCAT GTGAACGGAC AGATATCAGG 
CCTGGGACTT AGTTGGACAG AACTGGCAAG 
ATGAAATCAA TCAAATACGT GTGGAAAATC 
AGCTTCATGT TTTTAAAAAA ATGGGTTACC 
AACTGATGCC TTAACTTCGG TCTTGACAAA 
TGACACTGCT AGAAGGACCA ATATTTGATT 
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2401 ATGGAAATAT TTCAGGCACC 

2451 CATGACCCTG TTGATGGTTA 

2501. CACAGGGTTG CACTACACAC 

2551 TTAGTGATAT CTCTAGCATA 

2601 AGTGATGGGC TAGTGCCTTC 

2651 ACCTCCAGTC GTAACTGCAG 

2701 AAGACTCAGT GCCTTTAACA 

2751 GCCAGTTGGA GAATGTATGT 

2801 AACCTAGAGT CCTGCGCTCA 

2851 TCGACTGGAT GACAGCCCTG 

2901 TCAAAGGAGA AGCTGGCAAA 

2951 ACTCCAGAAG CAAAGACAAA 

3001 AGGAAAACAG AGTACCAAGG 

3051 GTCATGTTGA AGAACCAGCA 

3101 GAAGAAACCA GCAAGCTTAT 

3151 CAGTATGAAA AAGATGAGTA 

3201 TTAGCCTCCA TGAAGAAGAG 

3251 GAAGGTTTTA AGGTGAAAAC 

3301 GAGCCACTCG TAACAGCGAA 

3351 CCAGTATTGA GAAATTCGTG 

3401 CCGAGAAGTG TGTGTGTGTT 

3451 TTTTTATGCA AAAAAAAAAA 



AGAAGTTTTG CAGATGAGAA CAATGTTTTC 
TCCTTCCCTT CAAGTGGAAC TGGAAACCCC 
CACCTACCCC TTTCCAGCAA GATGATTATT 
GAATCTCCCC TTAGAACCCC TAGTAGACTG 
CCAGGGGAAC ATAGAGCATT CCGCAGATGG 
AAGACGCTTC CTTAGAAGAC AGCAAACTGG 
GAAATGCCTG AAGCAGTGAT GTAGATGAGA 
CTGAGTTGGC AGAATGAGAC ATCAAGTGGA 
AGCTCGAAGA GTAACTGGTG GGTTACTAGA 
ACCAGTGTAG AGATTCCATT ACCTCATATC 
TTTGAAGCAA ATGGAAGCCA TACAGAAATC 
ATCTTACTTT CCAGAATCCC AAAATGATGT 
AAACTCTGAA ACCAAAAATA CATGGATCTG 
TCACCACTAG CAGCATATCA GAAATCTCTA 
AATAGAAGAG ACTAAACCCT GTGTGCCTGT 
GGACTTCTCC AGCAGATGGC AAGCCAAGGC 
GGGTCCAGTG GGTCTGAGCA AAAGCAGGGA 
GAAGAAAGAA ATCCGGCATG TGGAAAAGAA 
CGGTCAGTCA AGGATCATAA GTTTTTACTG 
GAAGAAATGT CAGCAGGAAG TAAAAATTCA 
CGCTGCTTCC ACACATTAAT GGCATGATTT 



BLAST Results 



Entry MMANK3A_1 from database TREMBL: 

Ank3"; product: "ankyrin 3"; Mus mu... +3 4022 0.0 2 

Entry HS13616 from database EMBL: 

Human ankyrin G (ANK-3) mRNA, complete cds . 

Length « 14,770 

Plus Strand HSPs: 

Score - 8505 (1276.1 bits), Expect - 0.0, Sum P(3) - 0.0 
Identities = 1799/1873 (96%) 



Medline entries 



95394457: 

Chromosomal localization of the ankyrinG gene 
(ANK3/Ank3) to human 10q21 and mouse 10. 

95138209: 

A new ankyrin gene with neural-specific isoforms localized at the 
axonal initial segment and node of Ranvier 



Peptide information for frame 3 



ORF from 309 bp to 2741 bp; peptide length: 811 
Category: known protein 
Classification: unset 



1 MALPQSEDAM TGDTDKYLGP 
51 FSSDGSYTLN RSSYARDSMM 
101 ADTLDNVNLV PSPIHSGFLV 
151 PTRITCRLVK RHKLANPPPM 
201 FGSMRGKERE LIVLRSENGE 
251 LGKKRICRII TKDFPQYFAV 
301 PEGALTKRIR VGLQAQPVPD 
351 TMTIPVPPPS GEGVSNGYKG 
401 TFIKDCVSFT TNVSARFWLA 
451 FAKMNDPVES SLRCFCMTDD 
501 VDCYGNLAPL TKGGQQLVFN 
551 EPKTTKGLPQ TAVCNLNITL 
601 GMSPQSPCER TDIRMAIVAD 
651 LISQSFMFLK KWVTRDGKNA 
701 ISGTRSFADE NNVFHDPVDG 
751 ISSIESPLRT PSRLSDGLVP 
801 VPLTEMPEAV M 



QDLKELGDDS LPAEGYMGFS LGARSASLRS 
IEELLVPSKE QHLTFTREFD SDSLRHYSWA 
SFMVDARGGS MRGSRHHGMR IIIPPRKCTA 
VEGEGLASRL VEMGPAGAQF LGPVIVEIPH 
TWKEHQFDSK NEDLTELLNG MDEELDSPEE 
VSRIKQESNQ IGPEGGILSS TTVPLVQASF 
EIVKKILGNK ATFSPIVTVE PRRRKFHKPI 
DTTPNLRLLC SITGGTSPAQ WEDITGTTPL 
DCHQVLETVG LATQLYRELI CVPYMAKFVV 
KVDKTLEQQE NFEEVARSKD IEVLEGKPIY 
FYSFKENRLP FSIKIRDTSQ EPCGRLSFLK 
PAHKKIEKTD GRQSFASLAL RKRYSYLTEP 
HLGLSWTELA RELNFSVDEI NQIRVENPNS 
TTDALTSVLT KINRIDIVTL LEGPIFDYGN 
YPSLQVELET PTGLHYTPPT PFQQDDYFSD 
SQGNIEHSAD GPPVVTAEDA SLEDSKLEDS 



BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_24p5, frame 3 

TREMBL : MMANK3A_1 gene: "Ank3"; product: "ankyrin 3"; Mus musculus 
epithelial ankyrin 3 (Ank3) 5kb isoform mRNA, complete cds., N = 1, 
Score » 4022, P « 0 

TREMBL : MMANK3B_3 gene: "Ank3 M ; product: "ankyrin 3"; Mus musculus 
epithelial ankyrin 3 (7kb isoform) mRNA, complete cds., N - 1, Score <= 
4005, P - 0 

TREMBL : MMANK3B_4 gene; M Ank3"; product: "ankyrin 3"; Mus musculus 
epithelial ankyrin 3 (7kb isoform) mRNA, complete cds., N = 1, Score - 
4005, P = 0 



>TREMBL:MMANK3A_1 gene: w Ank3"; product: "ankyrin 3"; Mus musculus 
epithelial ankyrin 3 (Ank3) 5kb isoform mRNA, complete cds. 
Length = 1,094 

HSPs: 

Score = 4022 (603.5 bits), Expect = 0.0e+00, P » 0.0e+00 
Identities - 769/805 (95%), Positives = 783/805 (97%) 

Query: 1 MALPQSEDAMTGDTDKYLGPQDLKELGDDSLPAEGYMGFSLGARSASLRS FSSDGSYTLN 60 

MALP SEDA+TGDTDKYLGPQDLKELGDDSLPAEGY+GFSLGARSASLRS FSSD SYTLN 
Sbjct: 1 MALPHSEDAITGDTDKYLGPQDLKELGDDSLPAEGYVGFSLGARSASLRSFSSDRSYTLN 60 

Query: 61 RSSYARDSMMIEELLVPSKEQHLTFTREFDSDSLRHYSWAADTLDNVNLVPSPIHSGFLV 120 

RSSYARDSMMIEELLVPSKEQHLTFTREFDSDSLRHYSWAADTLDNVNLV SP+HSGFLV 
Sbjct: 61 RSSYARDSMMIEELLVPSKEQHLTFTREFDSDSLRHYSWAADTLDNVNLVSSPVHSGFLV 120 

Query: 121 SFMVDARGGSMRGSRHHGMRIIIPPRKCTAPTRITCRLVKRHKLANPPPMVEGEGLASRL 180 

SFMVDARGGSMRGSRHHGMRIIIPPRKCTAPTRITCRLVKRHKLANPPPMVEGEGLASRL 
Sbjct: 121 S FMV DARGG SMRG SRHHGMRIIIPPRKC T A PT R I TC RLVKRH KL AN P P PMV EG EGLAS RL 180 

Query: 181 VEMGPAGAQFLGPVIVEIPHFGSMRGKERELIVLRSENGETWKEHQFDSKNEDLTELLNG 240 

VEMG P AG AQFLG P V I V E I PHFGSMRGKERELI VLRSENGETWKEHQFDSKNEDL ELLNG 
Sbjct: • 181 VEMGPAGAQFLGPVIVEI PHFGSMRGKERELI VLRSENGETWKEHQFDSKNEDLAELLNG 240 

Query: 241 MDEELDSPEELGKKRICRIITKDFPQYFAWSRI KQESNQIGPEGGILSSTTVPLVQASF 300 

MDEELDSPEELG KRICRIITKDFPQYFAVVSRI KQESNQIGPEGGILSSTTVPLVQASF 
Sbjct: 241 MDEELDSPEELGTKRICRIITKDFPQYFAWSRI KQESNQIGPEGGILSSTTVPLVQASF 300 

Query: 301 PEGALTKRIRVGLQAQPVPDEIVKKILGNKATFSPIVTVEPRRRKFHKPITMTIPVPPPS 360 

PEGALTKRIRVGLQAQPVP+E VKKILGNKATFSPIVTVEPRRRKFHKPITMTIPVPPPS 
Sbjct: 301 PEGALTKRIRVGLQAQPVPEETVKKILGNKATFSPIVTVEPRRRKFHKPITMTIPVPPPS 360 

Query: 361 GEGVSNGYKGDTTPNLRLLCSITGGTSPAQWEDITGTTPLTFIKDCVSFTTNVSARFWLA 420 

GEGVSNGYKGD TPNLRLLCSITGGTSPAQWEDITGTTPLTFIKDCVSFTTNVSARFWLA 
Sbjct: 361 GEGVSNGYKGDATPNLRLLCSITGGTSPAQWEDITGTTPLTFIKDCVSFTTNVSARFWLA 420 

Query: 421 DCHQVLETVGLATQLYRELICVPYMAKFWFAKMNDPVESSLRCFCMTDDKVDKTLEQQE 480 

DCHQVLETVGLA+QLYRELICVPYMAKFVVFAK NDPVESSLRCFCMTDD+VDKTLEQQE 
Sbjct: 421 DCHQVLETVGLASQLYRELICVPYMAKFWFAKTNDPVESSLRCFCMTDDRVDKTLEQQE 480 

Query: 4 81 NFEEVARSKDIEVLEGKPIYVDCYGNLAPLTKGGQQLVFNFYSFKENRLPFSIKIRDTSQ 540 

NFEEVARSKDIEVLEGKPIYVDCYGNLAPLTKGGQQLVFNFYSFKENRLPFSIKIRDTSQ 
Sbjct: 481 NFEEVARSKDIEVLEGKPIYVDCYGNLAPLTKGGQQLVFNFYSFKENRLPFSIKIRDTSQ 540 

Query: 541 EPCGRLSFLKEPKTTKGLPQTAVCNLNITLPAHKKIEKTDGRQSFASLALRKRYSYLTEP 600 

EPCGRLSFLKEPKTTKGLPQTAVCNLNITLPAHKK EK D RQS FASLALRKRYS YLTE P 
Sbjct: 541 EPCGRLSFLKEPKTTKGLPQTAVCNLNITLPAHKKAEKADRRQS FASLALRKRYS YLTE P 600 

Query: 601 GMSPQSPCERTDIRMAIVADHLGLSWTELARELNFSVDEINQIRVENPNSLISQSFMFLK 660 

MSPQSPCERTDIRMAIVADHLGLSWTELARELNFSVDEINQIRVENPNSLISQSFM LK 
Sbjct: 601 SMSPQSPCERTDIRMAIVADHLGLSWTELARELNFSVDEINQIRVENPNSLISQSFMLLK 660 

Query: 661 KWVTRDGKNATTDALTSVLTKINRIDIVTLLEGPIFDYGNISGTRSFADENNVFHDPVDG 720 

KWVTRDGKNATTDALTSVLTKINRIDIVTLLEGPIFDYGNISGTRSFADENNVFHDPVDG 
Sbjct: 661 KWVTRDGKNATTDALTSVLTKINRIDIVTLLEGPIFDYGNISGTRS FADENNVFHDPVDG 720 

Query: 721 YPSLQVELETPTGLHYTPPTPFQQDDYFSDISSIESPLRTPSRLSDGLVPSQGNIEHSAD 7B0 

+PS QVELETP GL++TPP PFQQDD+FSDISSIESP RTPSRLSDGLVPSQGNIEH 
Sbjct: 721 HPSFQVELETPMGLYWTPPNPFQQDDHFSDISSIESPFRTPSRLSDGLVPSQGNIEHPTG 780 

Query: 781 GPPWTAEDASLEDSKLEDSVPLTE 805 
GPPVVTAED SLEDSK++DSV +T+ 
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Sbjct: 781 GPPVVTAEDTSLEDSKMDDSVTVTD 805 

Pedant information for DKFZphf kd2_24p5, frame 3 



PCT/IB00/01496 



Report for DKFZphf kd2_24p5 .3 



I LENGTH J 811 

[MWJ 90104.66 

Ipl] 5.40 

[HOMOL] TREMBL : MMANK3 A_l gene: "Ank3"; product: "ankyrin 3 M ; Mus musculus epithelial 

ankyrin 3 (Ank3) 5kb isoform mRNA, complete cds. 0.0 

[BLOCKS] BL50017B Death domain proteins profile 

[PIRKW] phosphoprotein 0.0 

[PIRKW] alternative splicing 0.0 

[PIRKW] peripheral membrane protein 0.0 

[PIRKW] cytoskeleton 0.0 

[SUPFAM] ankyrin 0.0 

[SUPFAM] ankyrin repeat homology 0.0 

[SUPFAM] unassigned ankyrin repeat proteins 0.0 

[KW] TRANSMEMBRANE 2 

[KW] LOW_COMPLEXITY 1.73 % 



SEQ MALPQSEDAMTGDTDKYLGPQDLKELGDDSLPAEGYMGFSLGARSASLRSFSSDGSYTLN 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccceeeeeccccccccc 

MEM 

SEQ RSSYARDSMMIEELLVPSKEQHLTFTREFDSDSLRHYSWAADTLDNVNLVPSPIHSGFLV 

SEG 

PRD cccchhhhhhhhheeeehhhhhhhhhhhccccccccccccccccccccccccccccceee 

MEM MMMMMMMMMMMM 

SEQ SFMVDARGGSMRGSRHHGMRI 1 1 PPRKCTAPTRITCRLVKRHKLANPPPMVEGEGLASRL 

SEG xxxxxxxxxxxxxx 

PRD eeeeeccccccccccccceeeecccccccccceeeeehhhhhccccccccccccccccee 

MEM MMMMMMMMMMMMMMMM M 

SEQ VEMGPAGAQFLGPVI VEI PHFGSMRGKERELI VLRSENGETWKEHQFDSKNEDLTELLNG 

SEG 

PRD eecccccceeeceeeeeeccccccccccceeeeeeccccceeeeeccccccchhhhhhhc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ MDEELDSPEELGKKRICRIITKDFPQYFAWSRIKQESNQIGPEGGILSSTTVPLVQASF 

SEG 

PRD cccccchhhhhhhhheeeeeeccccceeeeehhhhhcccccccccccccceeeeeeeccc 

MEM 

SEQ PEGALTKRIRVGLQAQPVPDEIVKKILGNKATFSPIVTVEPRRRKFHKPITMTIPVPPPS 

SEG 

PRD ccchhhhhhhhhhhhhccccceeeeccccccccccceeeccccccccccceeeecccccc 

MEM 

SEQ GEGVSNGYKGDTTPNLRLLCSITGGTSPAQWEDITGTTPLTFIKDCVSFTTNVSARFWLA 

SEG 

PRD ccccccccccccccceeeeeeeeccccccccccccccceeeeeeccccccccccceeeec 

MEM 

SEQ DCHQVLETVGLATQLYRELICVPYMAKFVVFAKMNDPVESSLRCFCMTDDKVDKTLEQQE 

SEG 

PRD cchhhhhhhhhhhhhhhhhhhhcchhhhhheeecccchhhhhhhhccccchhhhhhhhhc 

MEM 

SEQ NFEEVARSKDIEVLEGKPIYVDCYGNLAPLTKGGQQLVFNFYSFKENRLPFSIKIRDTSQ 

SEG 

PRD cceeecccceeeeeeccceeeeecccccccchhhhhhhhhchhhhhhhcceeeeeecccc 

MEM 

SEQ EPCGRLSFLKEPKTTKGLPQTAVCNLNITLPAHKKIEKTDGRQSFASLALRKRYSYLTEP 

SEG 1 

PRD ccccceeeeccccccccccccccccccccccccccccccccchhhhhhhhhhhhheeecc 

MEM 

SEQ GMSPQSPCERTDIRMAIVADHLGLSWTELARELNFSVDEINQIRVENPNSLISQSFMFLK 

SEG 

PRD ccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhhhcceeeeecccchhhhhhhhhh 

MEM 
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SEQ KWVTRDGKNATTDALTSVLTKINRIDIVTLLEGPI FDYGNISGTRSFADENNVFHDPVDG 

SEG 

PRD hhhhcccccccchhhhhhhhhhcceeeeeeeccccccccccccccccccccccccccccc 

MEM 

SEQ YPSLQVELETPTGLHYTPPTPFQQDDYFSDISSIESPLRTPSRLSDGLVPSQGNIEHSAD 

SEG 

PRD cccceeeeeccccccccccccccccccccceeeccccccccccccccccccccccccccc 

MEM 

SEQ GPPVVTAEDASLEDSKLEDSVPLTEMPEAVM 

SEG 

PRD ccceeeecccccccccccccccccccccccc 

MEM 



(No Prosite data available for DKFZphf kd2_24p5 . 3) 
(No Pfam data available for DKFZphf kd2_24p5. 3) 
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DKFZphfkd2_3il3 



group: transmembrane protein 

DKFZphf kd2_3il3 encodes a novel 406 amino acid protein with C. elegans cosmid Y37D8A and A. 
thaliana H71412 hypothetical protein. 

The novel protein contains 3 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of kidney-specific 
genes and as a new marker for kidney cells. 

similarity to A. thaliana and C. elegans; 
membrane regions : 3 

complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 
Locus: /map="17" 
Insert length: 2052 bp 

Poly A stretch at pos. 2032, no polyadenylation signal found 



1 AGTGACGTGA GCGGGTTCCG GTTGTCTGGA GCCCAGCGGC GGGTGTGAGA 
51 GTCCGTAAGG AGCAGCTTCC AGGATCCTGA GATCCGGAGC AGCCGGGGTC 
101 GGAGCGGCTC CTCAAGAGTT ACTGATCTAT GAAATGGCAG AGAATGGAAA 
151 AAATTGTGAC CAGAGACGTG TAGCAATGAA CAAGGAACAT CATAATGGAA 
201 ATTTCACAGA CCCCTCTTCA GTGAATGAAA AGAAGAGGAG GGAGCGGGAA 
251 GAAAGGCAGA ATATTGTCCT GTGGAGACAG CCGCTCATTA CCTTGCAGTA 
301 TTTTTCTCTG GAAATCCTTG TAATCTTGAA GGAATGGACC TCAAAATTAT 
351 GGCATCGTCA AAGCATTGTG GTGTCTTTTT TACTGCTGCT TGCTGTGCTT 
401 ATAGCTACGT ATTATGTTGA AGGAGTGCAT CAACAGTATG TGCAACGTAT 
451 AGAGAAACAG TTTCTTTTGT ATGCCTACTG GATAGGCTTA GGAATTTTGT 
501 CTTCTGTTGG GCTTGGAACA GGGCTGCACA CCTTTCTGCT TTATCTGGGT 
551 CCACATATAG CCTCAGTTAC ATTAGCTGCT TATGAATGCA ATTCAGTTAA 
601 TTTTCCCGAA CCACCCTATC CTGATCAGAT TATTTGTCCA GATGAAGAGG 
651 GCACTGAAGG AACCATTTTT TTGTGGAGTA TCATCTCAAA AGTTAGGATT 
701 GAAGCCTGCA TGTGGGGTAT CGGTACAGCA ATCGGAGAGC TGCCTCCATA 
751 TTTCATGGCC AGAGCAGCTC GCCTCTCAGG TGCTGAACCA GATGATGAAG 
801 AGTATCAGGA ATTTGAAGAG ATGCTGGAAC ATGCAGAGTC TGCACAAGAC 
851 TTTGCCTCCC GGGCCAAACT GGCAGTTCAA AAACTAGTAC AGAAAGTTGG 
901 ATTTTTTGGA ATTTTGGCCT GTGCTTCAAT TCCAAATCCT TTATTTGATC 
951 TGGCTGGAAT AACGTGTGGA CACTTTCTGG TACCTTTTTG GACCTTCTTT 
1001 GGTGCAACCC TAATTGGAAA AGCAATAATA AAAATGCATA TCCAGAAAAT 
1051 TTTTGTTATA ATAACATTCA GCAAGCACAT AGTGGAGCAA ATGGTGGCTT 
1101 TCATTGGTGC TGTCCCCGGC ATAGGTCCAT CTCTGCAGAA GCCATTTCAG 
1151 GAGTACCTGG AGGCTCAACG GCAGAAGCTT CACCACAAAA GCGAAATGGG 
1201 CACACCACAG GGAGAAAACT GGTTGTCCTG GATGTTTGAA AAGTTGGTCG 
1251 TTGTCATGGT GTGTTACTTC ATCCTATCTA TCATTAACTC CATGGCACAA 
1301 AGTTATGCCA AACGAATCCA GCAGCGGTTG AACTCAGAGG AGAAAACTAA 
1351 ATAAGTAGAG AAAGTTTTAA ACTGCAGAAA TTGGAGTGGA TGGGTTCTGC 
1401 CTTAAATTGG GAGGACTCCA AGCCGGGAAG GAAAATTCCC TTTTCCAACC 
14 51 TGTATCAATT TTTACAACTT TTTTCCTGAA AGCAGTTTAG TCCATACTTT 
1501 GCACTGACAT ACTTTTTCCT TCTGTGCTAA GGTAAGGTAT CCACCCTCGA 
1551 TGCAATCCAC CTTGTGTTTT CTTAGGGTGG AATGTGATGT TCAGCAGCAA 
1601 ACTTGCAACA GACTGGCCTT CTGTTTGTTA CTTTCAAAAG GCCCACATGA 
1651 TACAATTAGA GAATTCCCAC CGCACAAAAA AAGTTCCTAA GTATGTTAAA 
1701 TATGTCAAGC TTTTTAGGCT TGTCACAAAT GATTGCTTTG TTTTCCTAAG 
1751 TCATCAAAAT GTATATAAAT TATCTAGATT GGATAACAGT CTTGCATGTT 
1801 TATCATGTTA CAATTTAATA TTCCATCCTG CCCAACCCTT CCTCTCCCAT 
1851 CCTCAAAAAA GGGCCATTTT ATGATGCATT GCACACCCTC TGGGGAAATT 
1901 GATCTTTAAA TTTTGAGACA GTATAAGGAA AATCTGGTTG GTGTCTTACA 
1951 AGTGAGCTGA CACCATTTTT TATTCTGTGT ATTTAGGATG AAGTCTTGAA 
2001 AAAAACTTTA TAAAGACATC TTTAATCATT CCAAAAAAAA AAAAAAAAAA 
2051 AA 



BLAST Results 



Entry AC004 686 from database EMBL: 

*** SEQUENCING IN PROGRESS *** Homo sapiens chromosome 17, clone 
hRPC.1073 F 15; HTGS phase 1, 8 unordered pieces. 
Score - 4l42, P - 6.1e-199, identities « 830/832 
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Medline entries 

"*)No Medline entry 

Peptide information for frame 2 

ORF from 134 bp to 1351 bp; peptide length: 406 
Category: similarity to unknown protein 



1 MAENGKNCDQ RRVAMNKEHH NGNFTDPSSV NEKKRREREE RQNIVLWRQP 
51 LITLQYFSLE ILVILKEWTS KLWHRQSIVV SFLLLLAVLI ATYYVEGVHQ 
101 QYVQRIEKQF LLYAYWIGLG ILSSVGLGTG LHTFLLYLGP HIASVTLAAY 
151 ECNSVNFPEP PYPDQIICPD EEGTEGTIFL WSIISKVRIE ACMWGIGTAI 
201 GELPPYFMAR AARLSGAEPD DEEYQEFEEM LEHAESAQDF ASRAKLAVQK 
251 LVQKVGFFGI LAC AS I PN PL FDLAGITCGH FLVPFWTFFG ATLIGKAIIK 
301 MHIQKIFVII TFSKHIVEQM VAFIGAVPGI GPSLQKPFQE YLEAQRQKLH 
351 HKSEMGTPQG ENWLSWMFEK LVWMVCYFI LSIINSMAQS YAKRIQQRLN 
401 SEEKTK 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_3il3, frame 2 

TREMBL : CEY3 7 D8 A_2 0 gene: "Y37D8A.22"; Caenorhabditis elegans cosmid 
Y37D8A, N = 1, Score = 905, P = 8.8e-91 

TREMBL :ATAC98_2 gene: "YUP8H12.2"; Arabidopsis thaliana chromosome 1 
YAC yUP8H12 complete sequence., N = 1, Score - 470, P = l.le-44 

PIR:H71412 hypothetical protein - Arabidopsis thaliana, N = 1, Score = 
293, P = 6e-24 

>TREMBL:CEY37D8A_20 gene: "Y37D8A.22"; Caenorhabditis elegans cosmid 
Y37D8A 

Length - 457 

HSPS: 

Score = 905 (135.8 bits), Expect = 8.8e-91, P = 8.8e-91 
Identities = 167/317 (52%), Positives = 228/317 (71%) 

REERQNIVLWRQPLITLQYFSLEILVILKEWTSKLWHRQSIVVSFLLLLAVLIATYYVEG 
R ER+ IV WR+P I + Y +EI + E K+ +++++ + + + + Y+ G 



HQ++VQ IEK L +++W+ LG+LSS+GLG+GLHTFL+YLGPHIA+VT+AAYEC S++F 
AHQEHVQTIEKHILWWSWWVLLGVLSSIGLGSGLHTFLIYLGPHIAAVTMAAYECQSLDF 

PEPPYPDQIICPDEEGTEGTI FLWSI ISKVRI EACMWGIGTAIGELPPYFMARAARLSGA 
P+PPYP+ I CP + + F W I++KVR+E+ +WG GTA+GELPPYFMARAAR+SG 
PQPPYPESIQCPSTKSSIAVTF-WQIVAKVRVESLLWGAGTALGELPPYFMARAARISGQ 

EPDDEEYQEFEEMLE-HAESAQD FASRAKLAVQKLVQKVGFFGI LACAS I PN PLFD 

EPDDEEY+EF E++ ES D RAK V+ + ++GF GIL ASIPNPLFD 

EPDDEEYREFLELMNADKESDADQKLSIVERAKSWVEHNIHRLGFPGILLFASIPNPLFD 

LAGITCGHFLVPFWTFFGATLIGKAIIKMHIQKIFVIITFSKHIVEQMVAFIGAVPGIGP 
LAGITCGHFLVPFW+FFGATLIGKA++KMH+Q FVI+ FS H E V + +p +G p 
LAGITCGHFLVPFWSFFGATLIGKALVKMHVQMGFVILAFSDHHAENFVKILEKIPAVGP 

SLQKPFQE YLEAQRQKLH 350 

+++P + LE QR+ LH 
YIRQPISDLLEKQRKALH 409 

Pedant information for DKFZphf kd2_3i 13, frame 2 

Report for DKFZphf kd2_3il3. 2 



Query: 


38 


Sbjct: 


93 


Query: 


98 


Sbjct: 


153 


Query: 


158 


Sbjct: 


213 


Query: 


218 


Sbjct: 


272 


Query: 


273 


Sbjct: 


332 


Query: 


333 


Sbjct: 
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[LENGTH] 406 

EMW] 46298.17 

[pi] 6.47 

[HOMOLJ TREMBL:CEY37DBA_20 gene: "Y37D8A. 22"; Caenorhabditis elegans cosmid Y37D8A le- 
79 

[PROSITE] MYRISTYL 10 

[PROSITEJ CK2_PHOSPHO_SITE 3 

[PROSITE] PKC_PHOSPHO_SITE 1 

[PROSITE] ASN_GLYCOSYLATION 1 

[KW] TRANSMEMBRANE 3 

[KW] LOW_COMPLEXITY 9.85 % 



SEQ MAENGKNCDQRRVAMNKEHHNGNFTDPSSVNEKKRREREERQNIVLWRQPLITLQYFSLE 

SEG xxxxxxxxxx 

PRD ccccccchhhhhhhhhhhccccccccccccchhhhhhhhhhhhhhhhccccchhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ ILVILKEWTSKLWHRQSIVVSFLLLLAVLIATYYVEGVHQQYVQRIEKQFLLYAYWIGLG 

SEG xxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeccchhhhhhhhhhhhhhhhhhhhhh 

MEM MM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ ILSSVGLGTGLHTFLLYLGPHIASVTLAAYECNSVNFPEPPYPDQIICPDEEGTEGTIFL 

SEG xxxxxxxxxxx 

PRD hccccccccceeeeeeeccchhhhhhhhhhhccccccccccccccccccccccccceeee 

MEM 

SEQ WSIISKVRIEACMWGIGTAIGELPPYFMARAARLSGAEPDDEEYQEFEEMLEHAESAQDF 

SEG xxxxxxxxxxxxxxx 

PRD eehhhhhhhhhhhhhccccccccccchhhhhhhhcccccchhhhhhhhhhhhhhhhhhhh 

MEM 

SEQ ASRAKLAVQKLVQKVGFFGILACASIPNPLFDLAGITCGHFLVPFWTFFGATLIGKAIIK 

SEG 

PRD hhhhhhhhhhhhhhhcceeeeeeeecccccccccccccccceeeeeeehhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMM 

SEQ MHIQKI FVI ITFSKHIVEQMVAFIGAVPGIGPSLQKPFQEYLEAQRQKLHHKSEMGTPQG 

SEG 

PRD hhhhheeeeeeechhhhhhhhhhhhccccccccchhhhhhhhhhhhhhhhhhcccccccc 



MEM 



SEQ ENWLSWMFEKLWVMVCYFILSIINSMAQSYAKRIQQRLNSEEKTK 

SEG 

PRD cchhhhhhhhhheeehhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 

MEM 



Prosite for DKFZphf kd2_3il3 .2 



PS00001 


23->27 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


69->72 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00006 


29->33 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00006 


215->219 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00006 


236->240 


CK2~PHOSPHO~ 


SITE 


PDOC00006 


PS00008 


120->126 


MYRISTYL 




PDOC00008 


PS00008 


126->132 


MYRISTYL 




PDOC00008 


PS00008 


173->179 


MYRISTYL 




PDOC00008 


PS00008 


195->201 


MYRISTYL 




PDOC00008 


PS00008 


197->203 


MYRISTYL 




PDOC00008 


PS00008 


259->265 


MYRISTYL 




PDOC00008 


PS00008 


275->281 


MYRISTYL 




PDOC00008 


PS00008 


325->331 


MYRISTYL 




PDOC00008 


PS00008 


329->335 


MYRISTYL 




PDOC00008 


PS00008 


356->362 


MYRISTYL 




PDOC00008 



(No Pfara data available for DKFZphf kd2_3il3 .2) 
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DKFZphfkd2_3ol7 



group: metabolism 

DKFZphf kd2_3ol7 encodes a novel 72 amino acid protein with similarity to bos taurus NADH- 
ubiquinone oxidoreductase B33 subunit (EC 1.6.5.3) (EC 1.6.99.3). 

NADH: ubiquinone oxidoreductase is the first enzyme in the respiratory electron transport chain 
of mitochondria. It is a a membrane -bound multi-subunit protein. The bovine heart enzyme 
contains about 40 different polypeptides. The novel protein is the human orthologue of bovine 
B22. 

The new protein can find application in modulation of the respiratory electron transport chain 
pathways of mitochondria. 



strong similarity to bovine NADH-UBIQUINONE OXIDOREDUCTASE B22 subunit 

complete cDNA, complete cds, EST hits, 

in frame stop codon at -274 will be checked 

ESTs HS1291620/AA883920 show no stop codon at this side 

Sequenced by BMF2 

Locus: unknown 

Insert length: 693 bp 

Poly A stretch at pos. 670, polyadenylation signal at pos. 659 



1 CAGCAGGCGT GCAGTTTCCC GGCTCTCCGC GCGGCCGGGG AAGGTCAGCG 

51 CCGTAATGGC GTTCTTGGCG TCGGGACCCT ACCTGACCCA TCAGCAAAAG 

101 GTGTTGCGGC TTTATAAGCG GGCGCTACGC CACCTCGAGT CGTGGTGCGT 

151 CCAGAGAGAC AAATACCGAT ACTTTGCTTG TTTGATGAGA GCCCGGTTTG 

201 AAGAACATAA GAATGAAAAG GATATGGCGA AGGCCACCCA GCTGCTGAAG 

251 GAGGCCGAGG AAGAATTCTG GTAACGTCAG CATCCACAGC CATACATCTT 

301 CCCTGACTCT CCTGGGGGCA CCTCCTATGA GAGATACGAT TGCTACAAGG 

351 TCCCAGAATG GTGCTTAGAT GACTGGCATC CTTCTGAGAA GGCAATGTAT 

401 CCTGATTACT TTGCCAAGAG AGAACAGTGG AAGAAACTGC GGAGGGAAAG 

451 CTGGGAACGA GAGGTTAAGC AGCTGCAGGA GGAAACGCCA CCTGGTGGTC 

501 CTTTAACTGA AGCTTTGCCC CCTGCCCGAA AGGAAGGTGA TTTGCCCCCA 

551 CTGTGGTGGT ATATTGTGAC CAGACCCCGG GAGCGGCCCA TGTAGAAAGA 

601 GAGAGACCTC ATCTTTCATG CTTGCAAGTG AAATATGTTA CAGAACATGC 

651 ACTTGCCCTA ATAAAAAATC AGTAAAAAAA AAAAAAAAAA AAA 



BLAST Results 



Entry S28256 from database PIR: 

NADH dehydrogenase (ubiquinone) (EC 1.6.5.3) chain CI-B22 - bovine 
>TREMBL:MIBTCIB22_1 gene: "cI-B22"; product: "NADH-ubiquinone 
oxidoreductase complex B22 subunit"; B. taurus mitochondrion cI-B22 
mRNA for B22 subunit of the NADH-ubiquinone oxidoreductase complex 
Score - 933, P = 5.2e-93, identities =* 163/179, positives = 172/179, 
frame +2 



Medline entries 



92389317 

Sequences of 20 subunits of NADH ubiquinone oxidoreductase from RT bovine heart mitochondria. 
Application of a novel strategy for RT sequencing proteins using the polymerase chain reaction 

Peptide information for frame 2 



ORF from 56 bp to 271 bp; peptide length: 72 
Category: strong similarity to known protein 



1 MAFLASGPYL THQQKVLRLY KRALRHLESW CVQRDKYRYF ACLMRARFEE 

51 HKNEKDMAKA TQLLKEAEEE FW*RQHPQPY IFPDSPGGTS YERYDCYKVP 

101 EWCLDDWHPS EKAMYPDYFA KREQWKKLRR ESWEREVKQL QEETPPGGPL 

151 TEALPPARKE GDLPPLWWYI VTRPRERPM 
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B I. AS TP hits 

Sequences producing significant alignments: (bits) Value 

sp|Q02369|NI2M_BOVIN|0D36CEl728lFB735 (NDUFB9 ..) NADH-UBIQUINONE .. . 141 7e-34 
triU41534|Q18036|D34BCCB6E8FBCD5F (C16A3 . 4 J SIMILAR TO NADH-UBIQ. . . 53 3e-07 

>sp|Q02369|NI2M BOVIN | 0D36CE17281FB735 (NDUFB9 . . } NADH-UBIQUINONE 
OXIDOREDUCTASE B22 SUBUNIT (EC 1.6.5.3} (EC 1.6.99.3) 
(COMPLEX I-B22) (CI-B22) . [BOS TAURUS] 
Length - 178 

Score - 141 bits (351), Expect - 7e-34 
Identities = 63/71 (88%), Positives « 68/71 (95%) 

Query: 2 AFLASGPYLTHQQKVLRLYKRALRHLESWCVQRDKYRYFACLMRARFEEHKNEKDMAKAT 61 

AFL+SG YLTHQQKVLRLYKRALRHLESWC+ RDKYRYFACL+RARF+EHKNEKDM KAT 
Sbjct : 1 AFLSSGAYLTHQQKVLRLYKRALRHLESWCIHRDKYRYFACLLRARFDEHKNEKDMVKAT 60 

Query: 62 QLLKEAEEEFW 72 

QLL+EAEEEFW 
Sbjct: 61 QLLREAEEEFW 71 

>tr]U41534|Q18036!D34BCCB6E8FBCD5F (C16A3 . 4 ) SIMILAR TO 

NADH-UBIQUINONE OXIDOREDUCTASE B22 . [CAENORHABDITIS 

ELEGANSJ 

Length - 163 

Score - 52.7 bits (124), Expect = 3e-07 

Identities = 25/64 (39%), Positives = 41/64 (64%), Gaps «= 1/64 (1%) 

Query: 10 LTHQQKVLRLYKRALRHLESWCVQRD-KYRYFACLMRARFEEHKNEKDMAKATQLLKEAE 68 

L+H+QKV RLYKR LR +++W + + R+ C++RARF+ + +E D K+ LL + 
Sbjct: 12 LSHRQKVTRLYKRCLREVDNWYGGNNLEVRFQKCIIRARFDANADEVDTRKSQILLADGC 71 

Query: 69 EEFW 72 
+ W 

Sbjct: 72 RQLW 75 

Alert BLASTP hits for DKFZphf kd2_3ol7, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_3ol7, frame 2 



Report for DKFZphf kd2_3o 17 . 2 

[LENGTH] 72 

[MW] 8839.28 

[pi] 9.26 

[HOMOL] PIR:S28256 NADH dehydrogenase (ubiquinone) (EC 1.6.5.3) chain CI-B22 - bovine 

2e-34 

[KW] All_Alpha 

SEQ MAFLASGPYLTHQQKVLRLYKRALRHLESWCVQRDKYRYFACLMRARFEEHKNEKDMAKA 
PRD ccccccccchhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhcchhhhhhh 

SEQ TQLLKEAEEEFW 
PRD hhhhhhhhhccc 

(No Prosite data available for DKFZphf kd2_3ol7. 2) 
(No Pfam data available for DKFZphf kd2_3ol7 .2) 
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DKFZphfkd2__4 6a6 



group: kidney derived 

DKFZphfkd2_4 6a6 encodes a novel 315 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of kidney-specific 
genes. 



unknown 

complete cDNA, complete cds, EST hits 
Sequenced by MediGenomix 

Locus: /map="228.6 cR from top of ChrlS linkage group" 
Insert length: 2774 bp 

Poly A stretch at pos . 2751, polyadenylation signal at pos. 2732 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 



CTCGCGAGCG 
CTGCTCCTCC 
CAGAAGATCT 
CCCTGGACCA 
GGTGCCAAAC 
CATTTGTGGT 
TCCTCATGGC 
GGTCTGCGAT 
AATGGAGCCT 
TTGCCTGAGG 
TGTCCAAGCC 
ATAGGAACCA 
AGCATTGGGT 
AGCAGATAGT 
CAACAGATGC 
ATTCAAGAAT 
TGAAAGACCC 
TTCCTCATGA 
TGGATGGCAA 
TGGAGAGCAC 
CTGTCTCTGA 
GCCCTTATCA 
AGTTTTTAGT 
TAAGAGAGTG 
TACTTTTTTT 
TGTGATTTAC 
TTTAAGTAGC 
TGTCACCTTT 
CTCTGAGCTA 
GATGTGAGCT 
TAAGAAAATG 
CTAGAAAGAA 
GCAAAGGATT 
GGGATATGTT 
TGGAGACAGT 
GTCCTAGAGT 
CACCTGCAAA 
GATCACTGCC 
TAGGCGATGA 
CTCTCTGCCC 
ATCCTTTGTC 
TGGCAAGTTT 
TTTGTTTTTG 
TTACTGCTGT 
CGCACACCAG 
TACACAGCAT 
TCACACCCCT 
TGGCAGAAGG 
GGGGACAGCT 
CTGGACATTG 
CTGTTCTGAA 
TCTTTGAGGT 
AAAAAAGGTG 
TACTAGGCCA 



CAGCTATGGC 
GTCTTCTCAG 
TATTGTGGAA 
TTGATAATAA 
AAATTTCTTG 
TTACTTTGAC 
TTCCACTGGC 
AGAGTGTCTG 
CAAACATGGC 
AGGATGATGA 
CTGAATGCCA 
AGGCTTTAGC 
CAGCAGATCC 
ACTGAATCCC 
CCAGGTTGAT 
TAGCCAGTCT 
TTTTCAAAGT 
GCAAAGAAAA 
TCGGGGGAGA 
TGAATTATTC 
GATACCTCTC 
TGTTGGCTGC 
AGGAGGTTAA 
AGGAATACAG 
TTGTTCTAGG 
TCAAGTTGAA 
ATTTCCAGCA 
CCTGGGTGAT 
TGATGCTTTT 
ATGTGGGGCC 
CCTCTGGGCA 
TCAAAAAGCC 
TCTATTCCAG 
GTATGTTAGA 
ATGTGATAAC 
TCTCCCCTGC 
ACAAGGCACA 
AAAGTGGGAG 
ATTCCTGAGC 
TTCCAAGCCT 
TTGTTAGAGT 
TTAAAGGAAG 
TAGACTTTGT 
GGCTCTGAAC 
CTGAGAACTG 
GAAAGAAACA 
CCAGACACTA 
AATGGAATGC 
CAGTGACTGG 
CTTAGTGACC 
AGACTTTGAG 
GATTGCATTA 
GAACATGTTT 
TCTGGTTAGA 



TGCTGGCGTA 
GAGACCAGCT 
GTGACTTCCA 
AT ACT ATT C A 
TTACTGCAGA 
AGCACACGAA 
AAAAGCATGG 
AAGATGGTAT 
TTTGAATTGG 
CTTCCCAGAA 
ATGTGTGGTC 
CTTCTCAACT 
CTGTCACCCA 
TCTCTGATCA 
AGCATTGTGG 
TACCACTGGA 
TAAAGGAAAT 
GTGCATGCAG 
CAGAGATGAA 
ATACTAGGGT 
TACTCAGCCC 
CTGACTTGTT 
GGAGAAATCT 
TGATAGTAAT 
AATGAGGGTA 
GACAACCTCC 
TTCACACTTG 
TTGGGTTTTC 
ATTGGGAGGA 
GAAGTCTCAG 
TTCTTTTGAA 
AGTGTGGATT 
TGGGAAGGAA 
GAGAACCTTA 
ATACCGTGAT 
TGCTTGAGAT 
TTTCCCCCTT 
CACTAAGGGG 
ACCTTGTTTT 
GTAACCTCGG 
GGGTCAGCCC 
AGTGGAAAGT 
AATGCATATC 
TGGCACATAG 
GTTCTGGCCT 
GGTTGGGTTA 
CCTTATAAGC 
TACAGGGGCC 
AGCATTCAGG 
TTTTGTTCCT 
TCTGTGGTTC 
GGGAAGTTGG 
TCCTTAAAAG 
AAAAACAGAC 



CCCTGTGCGT 
GGTCCAACAT 
ATGATGCTGT 
GCAGACATCA 
GATTGCAGAA 
AATCGGGCCT 
TTACCTGAGG 
AAACCGACAA 
TAGAACTTAG 
TCTACAGGAG 
CAATGTAGTG 
CATTGACTGG 
GAGCAACCCC 
TCGGGGTGGT 
ATCCCATGTT 
GGAGGAGATG 
GAAAGACAAG 
AAAAGGTGGC 
ATTGAAGGCC 
TTGACCAACA 
AGTCATATTT 
TATAGGGTCC 
TTTTTTTCCT 
GAGTGAGGAT 
GGATAAATCT 
AGGCCATTCC 
ATACTGCACA 
TCCATTCAAG 
AAGGAGGCAG 
CCCGCAGCTA 
GTATAGTGTC 
TTTAGGCTGT 
ACCTCTCTAC 
AGGAGTCCTT 
TTTCATGAAG 
GCCAGAGCTG 
TCTCTTTAAA 
TGGGTGGGGA 
TCTTCCAAGG 
AGGACTATCT 
CAGAGGAACT 
ACTGCAAATA 
ATTAGCCCTC 
TACAGTGGAT 
AGGTGGGCTC 
GGAGCAGAAA 
ACTGCAGAAC 
AGCAGGAGTG 
AAGAGGCTTT 

ACCACCAGCC 
CTCTGGGATT 
ATGGAAGGTT 
CAGACTAGAA 



TAGTCACCAG 
ACCCTTGGAA 
GAGATTTTAT 
ATCTATGTGT 
TCTGTCCAAG 
TGATAGTGTC 
TGATGATCTT 
AAAGCTCAAG 
TCCAGAGGAG 
TAAAGCGAAT 
ATGAAGAATG 
AACAAACCAT 
ATTTGCCAGC 
GCATCTAACA 
AGATCTGGAT 
TGGAGAATTT 
GCTGCGACGC 
CAAAGCATTC 
TTTCATCTGA 
AAGATGCTAG 
TGCCAAAATT 
CCTTAATTTT 
CAGTATATTG 
TTCTTAAATA 
CAGAGGTCTG 
TGGTCAACCT 
TCAGGAGTTG 
GAGCTTGTAG 
CTGCAGAATT 
AGTCTCTACC 
TGAGCTCATG 
AATAAATGAG 
TGAGTTGTGG 
GTATGGGCCA 
AAATTCTTCT 
TGTTGTTGCA 
GCCAAAGAGA 
AGTGAAATGT 
TTCGTAGCTC 
TTTGTTCTCT 
GATAAGCAAA 
AAAATCCTTA 
ACTGTGATCA 
GGAAGGTGCC 
TAGAACCATT 
GAAATAAGGC 
CTGAAACAGA 
ACCACAGGGA 
CCAGGGAACA 
TTTTCTTTTA 
CATCAGTGTT 
GCAAAAAAAA 
TTAGAAAATA 
AAAGCTGTGA 
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2701 ATTTGATTTT GTAGATTAAA CAAAGCCAGA TGATTAAAAT GTGATTTATT 
2751 TATAAAAAAA AAAAAAAAAA AAAA 



BLAST Results 



Entry HS463358 from database EMBL: 
human STS WI-14364. 
Length = 472 
Minus Strand HSPs: 

Score «= 1605 (240.8 bits), Expect = 5.0e-68, P - 5.0e-68 
Identities = 347/361 (96%) 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 16 bp to 960 bp; peptide length: 315 
Category: putative protein 
Classification: unset 



1 MAAGVPCALV TSCSSVFSGD QLVQHTLGTE DLIVEVTSND AVRFYPWTID 
51 NKYYSADINL CWPNKFLVT AEIAESVQAF VVYFDSTRKS GLDSVSSWLP 
101 LAKAWLPEVM ILVCDRVSED GINRQKAQEW SLKHGFELVE LSPEELPEED 
151 DDFPESTGVK RIVQALNANV WSNVVMKNDR NQGFSLLNSL TGTNHSIGSA 
201 DPCHPEQPHL PAADSTESLS DHRGGASNTT DAQVDSIVDP MLDLDIQELA 
251 SLTTGGGDVE NFERPFSKLK EMKDKAATLP HEQRKVHAEK VAKAFWMAIG 
301 GDRDEIEGLS SDGEH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4 6a6, frame 1 

PIR:T04362 probable GTP-binding protein yptm3 - maize, N « 1, Score = 
87, P = 0.21 

PIR:S71585 GTP-binding protein GB2 - Arabidopsis thaliana, N = 1, Score 
- 86, P = 0.27 

>PIR:T04362 probable GTP-binding protein yptm3 - maize 
Length - 210 

HSPs: 

Score = 87 (13.1 bits), Expect » 2.4e-01, P - 2.1e-01 
Identities - 34/160 (21%), Positives = 67/160 (41%) 

Query: 48 TIDNKYYSADINLCWPNKFL-VTAEIAESVQAFWYFDSTRKSGLDSVSSWLPLAKAWL 106 

TIDNK I F +T ++ +D TR+ + ++SWL A+ 

Sbjct: 49 TIDNKPIKLQIWDTAGQESFRSITRSYYRGAAGALLVYDITRRETFNHLASWLEDARQHA 108 

Query: 107 PE VMIL — VCDRVSEDGINRQKAQEWSLKHGFELVELSPEELPEEDDDFPESTGVKR 161 

VM++ CD ++ ++ ++++ +HG +E S + ++ F ++ G 

Sbjct: 109 NANMTVMLIGNKCDLSHRRAVSYEEGEQFAKEHGLVFMEASAKTAQNVEEAFIKTAGT — 166 

Query: 162 IVQALNANVWSNVVMKNDRNQGFSLLNSLTGTNHSIGSADPC 203 

I + + ++ N G+++ NS G S AC 

Sbjct: 167 IYKKIQDGIFDVSNESNGIKVGYAVPNSSGGGAGSSSQAGGC 208 

Pedant information for DKFZphf kd2_4 6a 6, frame 1 

Report for DKFZphf kd2_4 6a 6. 1 

[LENGTH] 315 
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IMW] 34505.54 

[pi] 4.55 

[KW] Alpha_Beta 

[KW) LOW_COMPLEXITY 6.67 % 



SEQ MAAGVPCALVTSCSSVFSGDQLVQHTLGTEDLIVEVTSNDAVRFYPWTIDNKYYSADINL 

SEG 

PRD cccccceeeeecccccccccceeeeccccceeeeeeccccceeeecccccccccccccee 

SEQ CVVPNKFLVTAEIAESVQAFVVYFDSTRKSGLDSVSSWLPLAKAWLPEVMILVCDRVSED 

SEG 

PRD eeecccchhhhhhhhhhheeeeeeecccccccccccccccccccccccceeeeccccccc 

SEQ GINRQKAQEWSLKHGFELVELSPEELPEEDDDFPESTGVKRIVQALNANVWSNVVMKNDR 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD cchhhhhhhhhhcccceeeeccccccccccccccccccchhhhhhhhcccceeeeeeccc 

SEQ NQGFSLLNSLTGTNHSIGSADPCHPEQPHLPAADSTESLSDHRGGASNTTDAQVDSIVDP 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccch 

S EQ MLDLDI QELASLTTGGGDVENFERPFSKLKEMKDKAATLPHEQRKVHAEKVAKAFWMAI G 

SEG 

PRD hhhhhhhhhhhcccccccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhc 

SEQ GDRDEIEGLSSDGEH 

SEG 

PRD ccccccccccccccc 



(No Prosite data available for DKFZphf kd2_4 6a6. 1) 
{No Pfara data available for DKFZphfkd2_46a6.1) 
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DKFZphfkd2_46bl0 



group: kidney derived 

DKFZphfkd2 4 6bl0.1 encodes a novel 315 amino acid protein with similarity to C.elegans cosmide 
F25B5.3 

The novel protein contains a HTH-LYSR- family PROSITE pattern. Proteins of the lysR family are 
bacterial transcriptional regulatory proteins which bind DNA using a helix-turn-helix motif. 
Most of these proteins are transcription activators and usually negatively regulate their own 
expression. They all possess a potential ' helix-turn-helix ' DNA-binding motif in their N- 
terminal section. The * helix-turn-helix ' motif is missing in DKFZphf kd2_4 6a6. 1 . 
No informative BLAST results, no predictive PFAM or SCOP motive. 

The new protein can find application in studying the expression profile of kidney-specific 
genes. 



similarity to C.elegans F25B5.3 

complete cDNA, complete cds, EST hits 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 1285 bp 

Poly A stretch at pos. 1266, no polyadenylation signal found 



1 CAGTCTACGC GAGCTGCCTG TTTTTTTCCT GCTTGGACGC GCATGAGGGC 

51 CCCGTCCATG GACCGCGCGG CCGTGGCGAG GGTGGGCGCG GTAGCGAGCG 

101 CCAGCGTGTG CGCCCTGGTG GCGGGGGTGG TGCTGGCTCA GTACATATTC 

'151 ACCTTGAAGA GGAAGACGGG GCGGAAGACC AAGATCATCG AGATGATGCC 

201 AGAATTCCAG AAAAGTTCAG TTCGAATCAA GAACCCTACA AGAGTAGAAG 

251 AAATTATCTG TGGTCTTATC AAAGGAGGAG CTGCCAAACT TCAGATAATA 

301 ACGGACTTTG ATATGACACT CAGTAGATTT TCATATAAAG GGAAAAGATG 

351 CCCAACATGT CATAATATCA TTGACAACTG TAAGCTGGTT ACGGATGAAT 

401 GTAGAAAAAA GTTATTGCAA CTAAAGGAAA AATATTACGC TATTGAAGTT 

451 GATCCTGTTC TTACTGTAGA AGAGAAGTAC CCTTATATGG TGGAATGGTA 

501 TACTAAATCA CATGGTTTGC TTGTTCAGCA AGCTTTACCA AAAGCTAAAC 

551 TTAAAGAAAT TGTGGCAGAA TCTGACGTTA TGCTCAAAGA AGGATATGAG 

601 AATTTCTTTG ATAAGCTCCA ACAACATAGC ATCCCCGTGT TCATATTTTC 

651 GGCTGGAATC GGCGATGTAC TAGAGGAAGT TATTCGTCAA GCTGGTGTTT 

701 ATCATCCCAA TGTCAAAGTT GTGTCCAATT TTATGGATTT TGATGAAACT 

751 GGGGTGCTCA AAGGATTTAA AGGAGAACTA ATTCATGTAT TTAACAAACA 

801 TGATGGTGCC TTGAGGAATA CAGAATATTT CAATCAACTA AAAGACAATA 

851 GTAACATAAT TCTTCTGGGA GACTCCCAAG GAGACTTAAG AATGGCAGAT 

901 GGAGTGGCCA ATGTTGAGCA CATTCTGAAA ATTGGATATC TAAATGATAG 

951 AGTGGATGAG CTTTTAGAAA AGTACATGGA CTCTTATGAT ATTGTTTTAG 

1001 TACAAGATGA ATCATTAGAA GTAGCCAACT CTATTTTACA GAAGATTCTA 

1051 TAAACAAGCA TTCTCCAAGA AGACCTCTCT CCTGTGGGTG CAATTGAACT 

1101 GTTCATCCGT TCATCTTGCT GAGAGACTTA TTTATAATAT ATCCTTACTC 

1151 TCGAAGTGTT CCCTTTGTAT AACTGAAGTA TTTTCAGATA TGGTGAATGC 

1201 ATTGACTGGA AGCTCCTTTT CTCCACCTCT CTCAACACAC TCCTCACCGT 

1251 ATCTTTTAAC CCATTTAAAA AAAAAAAAAA AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 43 bp to 1050 bp; peptide length: 336 
Category: similarity to unknown protein 
Classification: unset 

Prosite motifs: HTH_LYSR_FAMILY (16-47) 
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1 MRAPSMDRAA VARVGAVASA SVCALVAGW LAQYIFTLKR KTGRKTKIIE 

51 MMPEFQKSSV RIKNPTRVEE IICGLIKGGA AKLQIITDFD MTLSRFSYKG 

101 KRCPTCHNII DNCKLVTDEC RKKLLQLKEK YYAIEVDPVL TVEEKYPYMV 

151 EWYTKSHGLL VQQALPKAKL KEIVAESDVM LKEGYENFFD KLQQHSIPVF 

201 IFSAGIGDVL EEVIRQAGVY HPNVKVVSNF MDFDETGVLK GFKGELIHVF 

251 NKHDGALRNT EYFNQLKDNS NIILLGDSQG DLRMADGVAN VEHILKIGYL 

301 NDRVDELLEK YMDSYDIVLV QDESLEVANS ILQKIL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4 6bl0, frame 1 

SWISSPROT:YQT3_CAEEL HYPOTHETICAL 42.0 KD PROTEIN F25B5.3 IN CHROMOSOME 
III., N a 1, Score « 524, P - 2.2e-50 

TREMBL:AC005499_12 gene: "T6A23.12"; Arabidopsis thaliana chromosome 
II BAC T6A23 genomic sequence, complete sequence., N = 2, Score = 194, 
P = 1.4e-26 



>SWISSPROT: YQT3_CAEEL HYPOTHETICAL 42.0 KD PROTEIN F25B5.3 IN CHROMOSOME 
III. 

Length « 376 

HSPs: 

Score - 524 (78.6 bits), Expect = 2.2e-50, P « 2.2e-50 
Identities = 112/300 (37%), Positives - 174/300 (58%) 



Query: 4 4 RKTKIIEMMPEFQ — KSSVRIKNPTRVEEIICGLIKGGAAKLQIITDFDMTLSRFSYK-G 100 

+KT ++ ++ + + + + +PT V + ++ GGA K +I+DFD TLSRF+ + G 
Sbjct: 73 KKTDVVPLLMNYLLGEEQILVADPTAVAAKLRKMVVGGAGKTVVISDFDYTLSRFANEQG 132 

Query: 101 KRCPTCHNIID-NCKLVTDECRKKLLQLKEKYYAI EVDPVLTVEEKYPYMVEWYTKSHGL 159 

+R T H + D N + E +K + LK KYY IE P LT+EEK P+M +W+ SH L 
Sbjct: 133 ERLSTTHGVFDDNVMRLKPELGQKFVDLKNKYYPIEFSPNLTMEEKIPHMEKWWGTSHSL 192 

Query: 160 LVQQALPKAKLKEIVAESDVMLKEGYENFFDKLQQHSIPVFI FSAGIGDVLEEVIRQA-G 218 

+V + K +++ V +S ++ K+G E+F + L H+IP+ IFSAGIG+++E ++Q G 
Sbjct: 193 IVNEKFSKNTIEDFVRQSRIVFKDGAEDFIEALDAHNIPLVIFSAGIGNIIEYFLQQKLG 252 

Query: 219 VYHPNVKVVSNFMDFDETGVLKGFKGELIHVFNKHDGAL-RNTEYFNQLKDNSNI I LLGD 277 

N +SN + FOE F LIH F K+- + + T +F+ + N+ILLGD 

Sbjct: 253 AIPRNTHFISNMILFDEDDNACAFSEPLIHTFCKNSSVIQKETSFFHDIAGRVNVILLGD 312 

Query: 278 SQG DLRMADGVAN VEH ILK I GYLNDRVDEL— LEKYMDSYDIVLVQDESLEVANSILQKI 335 

S GD+ M GV LK+GY N +D+ L+ Y + YDIVL+ D +L VA 1+ I 

Sbjct: 313 SMG DI HMDVG V ERDG PT LK VG Y Y N G S L DDT AALQH YE EV Y D I VL I H DPT LN V AQK I VD 1 1 372 

Pedant information for DKFZphfkd2_4 6bl0, frame 1 



Report for DKFZphf kd2_46bl0 . 1 

[LENGTH] 336 

[MWJ 37948.37 

(pi) 6.67 

t HOMOL ) SWISSPROT: YQT3_CAEEL HYPOTHETICAL 4 2.0 KD PROTEIN F25B5.3 IN CHROMOSOME III. 

3e-51 

[PROSITE] HTH_LYSR_FAMILY 1 

tKW) TRANSMEMBRANE 2 

(KWJ LOW_COMPLEXITY 7.44 % 

SEQ MRAPSMDRAAVARVGAVASASVCALVAGWLAQYI FTLKRKTGRKTKI I EMMPEFQKSSV 

SEG xxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccchhhhhcchhhhhhheeehhhhhhhhhhhhhhhhhhhhhccceeeehhhhhhhhhee 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ RIKNPTRVEEIICGLIKGGAAKLQI I TDFDMTLSRFSYKGKRCPTCHNI I DNCKLVTDEC 

SEG 

PRD eecccchhhhhhhhhhccccceeeeecccccceeeecccccccccccccccccchhhhhh 

MEM 
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SEQ RKKLLQLKEKYYAIEVDPVLTVEEKYPYMVEWYTKSHGLLVQQALPKAKLKEIVAESDVM 

SEG 

PRD hhhhhhhhhhhheeeccccccccccchhhhhhccccchhhhhhccchhhhhhhhhhhhcc 

MEM 

SEQ LKEGYENFFDKLQQHSIPVFIFSAGIGDVLEEVIRQAGVYHPNVKVVSNFMDFDETGVLK 

SEG 

PRD ccccchhhhhhhhhcccceeeeecccchhhhhhhhhhcccccceeeeeecccccccccee 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ GFKGELIHVFNKHDGALRNTEYFNQLKDNSNIILLGDSQGDLRMADGVANVEHILKIGYL 

SEG 

PRD eccceeeeeeecccccccccchhhhhhhhceeeeecccccccccccccccccceeeeeec 

MEM 

SEQ NDRVDELLEKYMDSYDIVLVQDESLEVANSILQKIL 

SEG 

PRD cchhhhhhhhhhhhheeeeeecchhhhhhhhhhccc 

MEM 



Prosite for DKFZphf kd2_4 6blO. 1 
PS00044 16->47 HTH LYSR FAMILY PDOC00043 



(No Pfam data available for DKFZphf kd2_46bl0. 1) 
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DKFZphfkd2_46dl3 



group: kidney derived 

DKFZphf kd2_46dl3 encodes a novel 506 amino acid protein with weak similarity to KE03 protein 
The novel protein contains a RGD site. 

No informative BLAST results; No predictive prosite, pfam or SCOP motive 

The new protein can find application in studying the expression profile of kidney-specific 
genes . 



similarity to KE03 protein 

complete cDNA, complete cds, EST hits 

Sequenced by MediGenomix 

Locus: /raap«"227.6 cR from top of Chrl linkage group" 
Insert length: 3346 bp 

Poly A stretch at pos. 3328, polyadenylation signal at pos . 3308 



1 CTCTCGCGAG AGGAGCAAGA GGAAGATGGC CGTGCCCTGT TTTTCGGTGT 
51 AAGGCAGCAG ACGGCGGCTG CGACGGCGAG ACTGAGATCC TGGTGTCGTG 
101 GGCACCTGAG TTCTAGCTTC CCCCAGCGAG CGCGCGTCCC TTCGTGCCTA 
151 GGCGAGAGCC GGCTCTTCCC CGGGAGATGC GTTTGTCCCA GGCTCGGGGG 
201 CTCAGTGGGA GTTCATGCTG CGCTGGAGGC TCTTGGCCAC CGCTCTAATC 
251 GCCTTGTGCC GCCGCAGCGC CAGCTCCGTC GCCAGCGGTG AGCCTCCCGA 
301 TTCCCCCCCT TGCCCCTGGC GGCGGCGATG ACCGGGGAGA AGATCCGCTC 
351 ACTGCGGAGG GACCACAAGC CCAGCAAAGA AGAAGGGGAC CTGCTGGAGC 
401 CCGGGGATGA AGAAGCGGCG GCTGCCCTCG GCGGTACCTT TACCAGAAGC 
451 AGGATTGGCA AGGGCGGCAA AGCTTGTCAT AAGATCTTCA GTAACCATCA 
501 CCACCGGCTA CAGCTGAAGG CAGCTCCGGC CTCCTCCAAT CCCCCCGGCG 
551 CCCCGGCTCT GCCGCTGCAC AATTCCTCCG TGACTGCCAA CTCCCAGTCC 
601 CCGGCCCTTC TGGCCGGCAC CAACCCCGTT GCTGTCGTCG CGGATGGAGG 
651 CAGTTGCCCC GCACACTACC CGGTGCACGA GTGCGTCTTC AAGGGGGATG 
701 TGAGGAGACT CTCCTCTCTC ATCCGCACGC ACAATATCGG GCAGAAAGAT 
751 AATCACGGAA ATACTCCTTT ACACCTTGCT GTGATGTTAG GAAATAAAGT 
801 TACAGCTCTT TTGAGGAAGC TTAAGCAGCA ATCCAGGGAA AGTGTTGAAG 
851 AAAAACGACC TCGATTATTA AAAGCCCTGA AAGAGCTAGG TGACTTTTAT 
901 CTAGAACTTC ACTGGGATTT TCAAAGCTGG GTGCCTTTAC TTTCCCGAAT 
951 TCTGCCTTCC GATGCATGTA AAATATACAA ACAAGGTATC AATATCAGGC 
1001 TTGACACAAC TCTCATAGAC TTTACTGACA TGAAGTGCCA ACGAGGGGAT 
1051 CTAAGCTTCA TTTTCAATGG GGATGCGGCG CCCTCTGAAT CTTTTGTAGT 
1101 ATTAGACAAT GAACAAAAAG TTTATCAGCG AATACATCAT GAGGAATCAG 
1151 AGATGGAAAC AGAAGAAGAG GTGGATATTT TAATGAGCAG TGATATTTAC 
1201 TCTGCAACTT TATCAACAAA ATCAATTTCT TTCACGCGTG CCCAGACAGG 
1251 ATGGCTTTTT CGGGAAGATA AAACAGAAAG AGTAGGAAAC TTTTTGGCAG 
1301 ACTTTTACCT GGTGAATGGA CTTGTTATAG AATCAAGGAA AAGAAGAGAA 
1351 CATCTCAGTG AAGAGGATAT TCTTCGAAAT AAGGCCATCA TGGAGAGTTT 
1401 GAGTAAAGGT GGAAACATAA TGGAACAGAA TTTTGAGCCG ATTCGAAGAC 
1451 AGTCTCTTAC ACCGCCTCCT CAGAACACTA TTACATGGGA AGAATATATA 
1501 TCTGCTGAAA ATGGAAAAGC TCCTCATCTG GGTAGAGAAT TGGTGTGCAA 
1551 AGAGAGTAAG AAAACGTTTA AAGCTACGAT AGCCATGAGC CAGGAATTTC 
1601 CCTTAGGGAT AGAGTTATTA TTGAATGTTT TAGAAGTAGT AGCTCCCTTC 
1651 AAGCACTTTA ACAAGCTTAG AGAATTTGTT CAGATGAAGC TTCCTCCAGG 
1701 CTTTCCTGTA AAATTAGATA TACCTGTGTT TCCCACAATC ACAGCCACTG 
1751 TGACTTTTCA GGAGTTTCGA TACGATGAAT TTGATGGCTC CATCTTTACT 
1801 ATACCTGATG ACTACAAGGA AGACCCAAGC CGTTTTCCTG ATCTTTAACT 
1851 GACGTGGAAA AGGATGCCGT CTAACCAAGG AAAGAAAATA CAGAGACCCT 
1901 AGAAGTGGAT CCAAATAGAA GGGACAAATG CTTTCAGTGA AGAAAAGGGA 
1951 ATTACACATT GAATCGACAC ATCAGTAATA CGATACAGTG AAATGGGCCT 
2001 CTAATAAGAA TTTCAGCGAG TTTTCTGATG TGCCATTTTT TGTCTTTTTA 
2051 AAAATATACA TATTATAAAT GTAATAGTTT GACACATTAA TGACCCTAAG 
2101 ACCTGCGTAT GTGAAGCAGC TATGAGTGCT GTGATTTGTT TTTAAAAATT 
2151 TTTACACTTC TTGTTGAAAT ATATATGCAT ATAAATATAT CTATATCTAT 
2201 ATCTATATCT AAAACACTCC TGGACCATTA ACGTAAATTA AATGTCTTAA 
2251 GAGATATGGA GCCCTTTTAA ACTTGTCATC TTTATGCAAG GTGACATTTA 
2301 TAAATATTCC TTCGAGCTTT GTTTTCATAA AATGTAAACT ATGTAACATT 
2351 ATGTATAGTT CAGTAATTTG AATGTTTGTT CAATATAATG AACTAGAAGG 
2401 AATGCAATTT TCTGTAGATG AATGAACCAA ATGGTAACCA TTAAACAATT 
2451 GCATTTATAT GTTGCAATAC ATTTCAGAAG GAGCGTTCAC TCTGCAGGGA 
2501 ATAAGGTACC TCCTTTAGCA CCTTAGTGCA ATTCATTGTG GTGCTATTTG 
2551 TTTTTACCTG AATGTTTGTT ACTAATCTTC CTTTCATAGA ACCTCTATTT 
2601 TTTTTTTTTC TAAACTTGAG TTTGAGTCCT TGTTATGGTC ATCATAAGGT 
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2651 AATGGTTAGC ATGTTTAAAG ATATTCCTCT TCCAAATCTC AGCACTTTAA 
2701 AAAAAAATCC AAATTTTTAA ACTTGCTTCC TAATAAGTAC ACATCGGTCT 
2751 GATTATTTTG TTTGTTTTTA GTAGAATATG GATGCATTGG TGTCAGTTTT 
2801 AAAAAACAAT ACACATATTT TGGACAACCC TACATATTTA ATCCTTTCAA 
2851 AATAAGATAA AAACATTTTA TATGCTAACA GAATATATTT GTTACAAGTT 
2901 AAAGTCCAGA AGTATACACA AGATTGATTA CTCCTATTAT TTTTTTTAAA 
2951 TCACAGGAAA AT ATT G ATT T CATTGTCTCC AAAGTGATAA AATCTTGTAT 
3001 TACTCATTTT TGCACTTAAA ATTTTTCTTA TTTATTCCAA GGTGGTTTGA 
3051 AGGTCCAAGT ATGAAAATAA ATTAGGGGGA TTAATGTATA ACAGTTATAA 
3101 AGTATCATGT TGTATTAAAG AGCTTACTTA GATTGATGTT TTTAAAATGT 
3151 ATCCTGATGA ATGTCTCAAG AATGCATCTG TCAAGTTTTT TAGACTGACC 
3201 AGTAGCTTAA ACTTTTTTCA GGATTTTAGG TAATTTGAAA GGAGTTTAGA 
3251 GACCCTTATT GAAAATATGA TTTAAAAATC CAAAGCATAA ACCGTAAGAA 
3301 AAATTTTAAA TAAACATCTT TAAAGCTGAA AAAAAAAAAA AAAAAA 



BLAST Results 



Entry HS121353 from database EMBL: 
human STS WI-14729. 
Score = 1697, P - 1.9e-69, identities « 363/379 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 328 bp to 1845 bp; peptide length: 506 
Category: similarity to unknown protein 



1 MTGEKIRSLR RDHKPSKEEG DLLEPGDEEA AAALGGTFTR SRIGKGGKAC 
51 HKIFSNHHHR LQLKAAPASS NPPGAPALPL HNSSVTANSQ SPALLAGTNP 
101 VAVVADGGSC PAHYPVHECV FKGDVRRLSS LIRTHNIGQK DNHGNTPLHL 
151 AVMLGNKVTA LLRKLKQQSR ESVEEKRPRL LKALKELGDF YLELHWDFQS 
201 WVPLLSRILP SDACKIYKQG INIRLDTTLI DFTDMKCQRG DLSFIFNGDA 
251 APSESFWLD NEQKVYQRIH HEESEMETEE EVDILMSSDI YSATLSTKSI 
301 SFTRAQTGWL FREDKTERVG NFLADFYLVN GLVIESRKRR EHLSEEDILR 
351 NKAIMESLSK GGNIMEQNFE PIRRQSLTPP PQNTITWEEY ISAENGKAPH 
401 LGRELVCKES KKTFKATIAM SQEFPLGIEL LLNVLEVVAP FKHFNKLREF 
451 VQMKLPPGFP VKLDIPVFPT ITATVTFQEF RYDEFDGSIF TIPDDYKEDP 
501 SRFPDL 

BLAST P hits 
Entry CEC01F1_3 from database TREMBL: 

gene: "C01F1.6"; Caenorhabditis elegans cosmid C01F1. 

Score = 371, P » 4.5e-61, identities = 69/138, positives « 96/138 

Entry CEC18F10_9 from database TREMBL: 

gene: "C18F10.7"; Caenorhabditis elegans cosmid C18F10. 

Score = 383, P = 3.4e-39, identities = 103/349, positives = 182/349 

Entry AF064604_1 from database TREMBL: 

product: "KE03~protein w ; Homo sapiens KE03 protein mRNA, partial cds. 
Score « 348, P « 8.3e-32, identities = 95/295, positives = 148/295 



Alert BLASTP hits for DKFZphf kd2_46dl3, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_4 6dl 3, frame 1 



Report for DKFZphf kd2_46dl3 . 1 



[LENGTH] 506 

[MW] 57003.12 

[plj 6.40 
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[HOMOLl 


TREMBL:C£C18F10 9 


gene : 


[BLOCKS] 


BL01288E 




[PROSITE] 


RGD 1 




tPROSITE] 


MYRISTYL 7 




(PROSITE] 


CAMP PHOSPHO SITE 


2 


[PROSITE] 


CK2 PHOSPHO SITE 


9 


[PROSITE] 


PKC PHOSPHO SITE 


6 


[PROSITE] 


ASN_GLYCOSYLATION 


1 


[KW] 


Alpha Beta 




[KW] 


LOW_COMPLEXITY 


7.51 



"C18F10.7"; Caenorhabditis elegans cosmid C18F10. 2e-35 



SEQ 
SEG 
PRO 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MTGEKIRSLRRDHKPSKEEGDLLEPGDEEAAAALGGTFTRSRIGKGGKACHKIFSNHHHR 

xxxxxxxxxxxx 

ccceeeeeccccccccccccccccccchhhhhhhccccccccccccceeeeeeecchhhh 

LQL KAA P A S SN P PG A P AL P LHN S S VT AN S QS P AL L AGT N P V A VV ADGG SCPAHYPVHECV 

. . . .xxxxxxxxxxxxxxxx 

hhhhhhccccccccceeecccccccccccccceeecccccceeeecccccccccccceee 

FKGDVRRLSSLIRTHNIGQKDNHGNTPLHLAVMLGNKVTALLRKLKQQSRESVEEKRPRL 

eccchhhhhhhhhhcccccccccccccceeeecccchhhhhhhhhhhhcchhhhhhhhhh 

LKALKELGDFYLELHWDFQSWVPLLSRILPSDACKIYKQGINIRLDTTLIDFTDMKCQRG 

hhhhhhccccceeehhhhhccceeeeccccccceeeeeccceeeeeeeeecccccccccc 

DLSFIFNGDAAPSESFVVLDNEQKVYQRIHHEESEMETEEEVDILMSSDIYSATLSTKSI 

xxxxxxxxxx 

ceeeeeccccceeeeeeeecccceeeehhhhhhhhhhhhhhhhhhhhccceeeecccccc 

SFTRAQTGWLFREDKTERVGNFLADFYLVNGLVIESRKRREHLSEEDILRNKAIMESLSK 

eeeecccceeeecccchhhhhhheeeeeeeeeeeeehhhhhhhhhhhhhhhhhhhhhhhc 

GGNIMEQNFEPIRRQSLTPPPQNTITWEEYISAENGKAPHLGRELVCKESKKTFKATIAM 

cceeeccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhh 

SQEFPLGIELLLNVLEVVAPFKHFNKLREFVQMKLPPGFPVKLDIPVFPTITATVTFQEF 

hhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccceeeeeeeeeeehhhhhhhcc 

RYDEFDGSIFTIPDDYKEDPSRFPDL 

cccccccceeeccccccccccccccc 



Prosite for DKFZphf kd2_4 6dl3. 1 



PS00001 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00016 



82->86 
126->130 
373->377 
8->ll 
296->299 
316->319 
336->339 
410->413 
413->416 

16->20 
172->176 
228->232 
274->278 
278->282 
344->348 
386->390 
476->480 
491->495 

35->41 

46->52 
108->114 
138->144 
155->161 
320->326 
487->493 
239->242 



ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

C AMP_PHOS PHO_S I TE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO SITE 

CK2~PHOSPHO~SITE 

CK2_PH0SPH02SITE 

CK2 PHOSPHO_SITE 

CK2~PH0SPH02SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

RGD 



PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC0O006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00016 



(No Pfam data available for DKFZphf kd2_4 6dl3 . 1 ) 
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DKFZphfkd2_46j20 



group: metabolism 

DKFZphf kd2_34 6 j 20 encodes a novel 224 amino acid protein similar to 2-hydroxyhepta-2 , 4-diene- 
1, 7-dioate isomerase. 

The new protein seems to be the human ortholog of 2-hydroxyhepta-2, 4-diene-l, 7-dioate 
isomerase . 

The new protein can find application in modulating the homoprotocatechuate degradative pathway 
and as a enzyme for biotechnologic production processes. 



strong similarity to 2-hydroxyhepta-2, 4-diene-l, 7-dioate isomerase 
complete cDNA, complete cds, EST hits, 

potential start at Bp 16 matches kozak consensus ANCatgG 

strong similarity to proteins of worm plant archea and bacteria 

2-hydroxyhepta-2, 4-diene-l, 7-dioate isomerase is part of 

the tyrosine metabolism (degradation of tyrosine late step) EC 5.3.1.- 

complete cds according to similar c.elegans and A.thaliana protein 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 1706 bp 

Poly A stretch at pos. 1686, polyadenylation signal at pos. 1667 



1 CACTTGATGG GAATCATGGC AGCATCCAGG CCATTGTCCC GCTTCTGGGA 
51 GTGGGGAAAG AACATCGTCT GCGTGGGGAG GAACTACGCG GACCACGTCA 
101 GGGAGATGCG CAGCGCGGTG TTGAGCGAGC CCGTGCTGTT CCTGAAGCCG 
151 TCCACGGCCT ACGCGCCCGA GGGCTCGCCC ATCCTCATGC CCGCGTACAC 
201 TCGCAACCTG CACCACGAGC TGGAGCTGGG CGTGGTGATG GGCAAGCGCT 
251 GCCGCGCAGT CCCCGAGGCT GCGGCCATGG ACTACGTGGG CGGCTATGCC 
301 CTGTGCCTGG ATATGACCGC CCGGGACGTG CAGGACGAGT GCAAGAAGAA 
351 GGGGCTGCCC TGGACTCTGG CGAAGAGCTT CACGGCGTCC TGCCCGGTCA 
401 GCGCGTTCGT GCCCAAGGAG AAGATCCCTG ACCCTCACAA GCTGAAGCTC 
451 TGGCTCAAGG TCAACGGCGA ACTCAGACAG GAGGGTGAGA CATCCTCCAT 
501 GATTTTTTCC ATCCCCTACA TCATCAGCTA TGTTTCTAAG ATCATAACCT 
551 TGGAAGAAGG AGATATTATC TTGACTGGGA CGCCAAAGGG AGTTGGACCG 
601 GTTAAAGAAA ACGATGAGAT CGAGGCTGGC ATACACGGGC TGGTCAGTAT 
651 GACATTTAAA GTGGAAAAGC CAGAATATTG AGTTATTTCT TAACAAGTTT 
701 CGAGAGAGAA GGGAGCAAGA CAAGAGCAAG CAACGGCTAT TAAATGTCAC 
751 AATCCTTTAA TTAGAAACCA TTTATTGGCC GGACGCGGTG GCTCACGCCT 
801 GTAATCGCAG CACTTTGGGA GGCCGAGGCG GGCGGCTCAC GACGTCAGGA 
851 GATCCAGACC ATCTTGGCTA ACAGGGTGAA ACCCCGTCTC TACTAAAAAT 
901 ACAAAAAATT AGCCGGGCGT GGTGGCGGGC GCCTGTAGTC CCAGCTACTC 
951 TGGAGGCTGA GGCAGGAGAA TCAATTGAAC CCGGGAGGCG GAGCTTACAG 
1001 TGAGCTGAGA TTGCGCCACT GTACTCCTGG GCAACAGCGA GACTCCGTCT 
1051 CAAAAAAAAA AAAAAAAAAA AGAAACCATT TATTTTAAAA ATGATTAGAT 
1101 TGCTATGCCT CAACTCATAG AAGATGAACC CTTCAAGAAA ACGTGAAGTA 
1151 GAACGGGTGG GCCAGAAATG AAAACAGGCA AGTAAAGTAT TTCTTCGGAA 
1201 AACATTTTAT CAAACCAAAT GTTAAAAAGA CTTTCCTTTT GTAAAACTGG 
1251 ATTAGAGAAG ACTTTTCAGT GGGTTATCTC TAGGATGATC AGTAGTTCAG 
1301 CACTTAAAAA CTGCAGAGAA AACTGAAAGT TATGTTCCAG ATAACTTTCC 
1351 GTTGTTTACC AAATTTTCTT AGATTTGGTC ATCATCAGGA AGCATTTGTA 
1401 AAAATAAAAA TCTCCACAAA TTACTGGCCC ATCTCGGACT TGCTGAATCA 
14 51 ATTTGATAGG ATTAATCTCC AGTGAAGCTG TGTTTACAGG GCATTCCAAG 
1501 TGATTCTTAT CAGGAAATGT GAAAAACACT CCTGTACATA ATCGGTTAAT 
1551 TTAAAATTTT ACTTAATAAG TGAACAAGTA ATGAAGATTT CACCTGTTTA 
1601 CTTAGGGTAT CTACCCAGAC CCATCGATTC TGAGTTCGGG AGATGATTTT 
1651 GAAATTACTG TTTTCCAAAT AAAGGTGCTC CCTTCCAAAA AAAAAAAAAA 
1701 AAAAAA 



BLAST Results 



No BLAST result 



Medline entries 
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94039092: Purification, nucleotide sequence and some properties of a bifunctional 
isomerase/decarboxylase from the homoprotocatechuate degradative pathway of Escherichia coli 
C. 



Peptide information for frame 1 



ORF from 7 bp to 678 bp; peptide length: 224 
Category: strong similarity to known protein 



1 MGIMAASRPL SRFWEWGKNI VCVGRNYADH VREMRSAVLS EPVLFLKPST 
51 AYAPEGSPIL MPAYTRNLHH ELELGVVMGK RCRAVPEAAA MDYVGGYALC 
101 LDMTARDVQD ECKKKGLPWT LAKSFTASCP VSAFVPKEKI PDPHKLKLWL 
151 KVNGELRQEG ETSSMIFSIP YIISYVSKII TLEEGDIILT GTPKGVGPVK 
201 ENDEIEAGIH GLVSMTFKVE KPEY 



BLAST P hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphf Jcd2_46j20, frame 1 

PIR:S44919 ZK688.3 protein - Caenorhabditis elegans, N « 1, Score = 
537, P « 8.7e-52 

PIR:D71109 probable 2-hydroxyhepta-2, 4-diene-l, 7-dioate isomerase - 
Pyrococcus horikoshii, N = 1, Score - 529, P » 6.1e-51 

PIR:C71425 hypothetical protein - Arabidopsia thaliana, N = 1 , Score *» 
519, P - 7e-50 

PIR:A64864 probable 2-hydroxyhepta-2, 4-diene-l, 7-dioate isomerase bll80 
- Escherichia coli, N = 1, Score = 474, P = 4.1e-45 



>PIR:S44919 ZK688.3 protein - Caenorhabditis elegans 
Length « 214 

HSPs: 



Score = 537 (80.6 bits), Expect = 8.7e-52, P - 8.7e-52 
Identities » 99/211 (46%), Positives = 138/211 (65%) 



Query: 


10 


LSRFWEWGKN I VCVGRNYADH VREMRSAVLS EPVLFLKPST AY APEGSP I LMPAYTRNLH 


69 






L+ F IVCVGRNY DH E+ +A+ +P+LF+K ++ EG PI+ P +NLH 




Sbjct: 


4 


LAGFRNLATKIVCVGRNYKDHALELGNAIPKKPMLFVKTVNSFIVEGEPIVAPPGCQNLH 


63 


Query: 


70 


HELELGVVMGKRCRAVPEAAAMDYVGGYALCLDMTARDVQDECKKKGLPWTLAKSFTASC 


129 






E+ELGVV+ K+ + ++ AMDY+GGY + LDMTARD QDE KK G PW LAKSF SC 




Sbjct: 


64 


QEVELGVVISKKASRISKSDAMDYIGGYTVALDMTARDFQDEAKKAGAPWFLAKSFDGSC 


123 


Query: 


130 


PVSAFVPKEKIPDPHKLKLWLKVNGELRQEGETSSMIFSIPYIISYVSKIITLEEGDIIL 


189 






P+ F+P IP+PH ++L+ K+NG+ +Q T MIF IP ++ Y ++ TLE GD++L 




Sbjct: 


124 


PIGGFLPVSDIPNPHDVELFCKINGKDQQRCRTDVMIFDI PTLLEYTTQFFTLEVGDVVL 


183 


Query: 


190 


TGT PKGVGPVKENDEI EAG I H GLVSMT FKVE 220 








TGTP GV + D IE G+ ++ F V+ 




Sbjct: 


184 


TGTPAGVTKINSGDVIEFGLTDKLNSKFNVQ 214 





Pedant information for DKFZphf kd2_4 6 j 20, frame 1 



Report for DKFZphf kd2_4 6 j 20 . 1 



[LENGTH) 224 

[MW] 24843.07 

[pi] 6.96 

[HOMOLJ PIR:S44919 ZK688.3 protein - Caenorhabditis elegans 8e-55 

[FUNCAT] r general function prediction [M. jannaschii, MJ1656] 9e-40 

[FUNCAT] 99 unclassified proteins (S. cerevisiae, YNLl68c] 4e-38 

[EC) 5.3.3.10 5-Carboxymethyl-2-hydroxymuconate delta-isoraerase le-35 

[PIRKW] isomerase le-35 

[PIRKW] intramolecular oxidoreductase le-35 

[SUPFAM] 2-hydroxyhepta-2, 4-diene-l, 7-dioate isomerase le-46 

[PROSITE] MYRISTYL 4 

[PROSITE] AMI DAT I ON 1 
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[PROSITEJ CK2_PH0SPH0 SITE 2 

(PROSITE) PKC_PHOSPHO~SITE 3 

{KW] Alpha_Beta 



SEQ MGIMAASRPLSRFWEWGKNIVCVGRNYADHVREMRSAVLSEPVLFLKPSTAYAPEGSPIL 

PRD cccccccccchhhhhhcceeeeeecchhhhhhhhhccccccceeeecccccccccccccc 

SEQ MPAYTRNLHHELELGVVMGKRCRAVPEAAAMDYVGGYALCLDMTARDVQDECKKKGLPWT 

PRD cccccchhhhhhheeeccccccccchhhhhhhheeeeeeccchhhhhhhhhhhhcccccc 

SEQ LAKS FTASCPVSAFVPKEKI PDPHKLKLWLKVNGELRQEGETSSMI FSI PYI IS YVSKI I 

PRD cccccccccccceeeecccccccccceeeeecccccccccccccceeechhhhhhhhhhh 

SEQ TLEEGDIILTGTPKGVGPVKENDEIEAGIHGLVSMTFKVEKPEY 

PRD hccccceeeeccccccccccccceeeeeeccccccccccccccc 



Prosite for DKFZph f kd2_4 6 j 2 0 . 1 



PS00005 


104->107 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


192->195 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


216->219 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


104->108 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


181->185 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


2->8 


MYRISTYL 




PDOC00008 


PS00008 


75->81 


MYRISTYL 




PDOC00008 


PS00008 


116->122 


MYRISTYL 




PDOC00008 


PS00008 


191->197 


MYRISTYL 




PDOC00008 


PS00009 


78->82 


AMI DAT ION 




PDOC00009 



(No Pfam data available for DKFZphf kd2_4 6j20 . 1) 



413 



WO 01/12659 



PCT/IB00/01496 



DKFZphfkd2_4 6kl9 



group: transcription factors 

DKFZphf kd2_46kl9. 3 encodes a novel 130 amino acid protein similar to rat Dcoh, a bifunctional 
protein-binding transcriptional co-activator. 

Dcoh is a bifunctional protein, complexed with biopterin. It serves as dimerization cofactor 
of hepatocyte nuclear factor-1 and catalyzes the dehydration of the biopterin cofactor of 
phenylalanine hydroxylase. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by the hepatocyte nuclear factor-1. 



strong similarity to pterin-4-alpha-carbinolamine dehydratase 

potential start at Bp 102 according to similar proteins, 
both genomic sequences are from chromosome 5, 

Sequenced by MediGenomix 

Locus: map="5" 

insert length: 5641 bp 

Poly A stretch at pos. 5617, polyadenylation signal at pos. 5598 



1 CAGCCCTCGG CAGACGGCCA 
51 ACGCGGCGCT TGTTGGCGGC 
101 CATGTCATCA GGTACTCACA 
151 TACTTGACCT TAAAGCAGCA 
201 ATCTACAAAG AATTCTCCTT 
251 GTCCCGAGTT GCCCTACAAG 
301 TCAATGTATA CAACAAGGTC 
351 GAACTGACCA AAAAAGATGT 
401 TGCTTCTGTG TGATTTCTTC 
451 GATGGCTGTG TTAACATATG 
501 ATGGCTCATA ATGACAGTGG 
551 CACCTACATT AGGGTTTGAC 
601 CTGGAACTCA CAGACTTTAC 
651 TCTATGGAAA TGCTCATGGT 
701 TTGCTTAAAG TAACTCACGT 
751 GAGGCCCCCA GGTTCCTGTC 
801 TTCTCTGATG TGGTAAGCTT 
851 CACTGGAGTA GAGAGGAGTT 
901 ACCTCCACAG ATAGCAAACC 
951 CTACAATGAA GTTAATGAAA 
1001 TGATAGGATT TAGGAAACCT 
1051 TGTTTTTGCT ATAGACAAAA 
1101 AAGCCTGCCT CGGTTAATAT 
1151 GGAGCGGTCT GTACACTTTC 
1201 ACTCTGTAGC TTTCAGTTTT 
1251 TTGGCCATCA TATGTGAGCT 
1301 GTAAATTAAT GACTGTCCAG 
1351 TGCCATTGGC TGACTCTCCC 
1401 AGGTCACGCA GAGCATGAGC 
1451 TGGGCTTCTC ATCCCAGGAT 
1501 TATGTATGAT TTCAGTAGGC 
1551 GTTTGCTTTC CACTCACTCA 
1601 CATCATTGGC TTCAGAAACA 
1651 CACCAAGAAC AACTGGGCTC 
1701 CTCACCTCTC CAAGCAGCAT 
1751 TTTGTTTTTC CCTGAAAGTA 
1801 AAGTATACTA CTGAGTTTCC 
1851 GTGAATGAGC ACAGGGATCC 
1901 CACACACTTA CTGAGGGCCT 
1951 ACATATCAGG GCAGGTAGAA 
2001 GAGCAGCTGC CCCAGGAGGC 
2051 GAGACGTTAG GGGCATATAA 
2101 GCTGATCCCT GAGGGAAACA 
2151 GAATTGTGGG GACATTAATC 
2201 ATTTTAAATG GAGAAAATGA 
2251 TATAGGTTGC CCACAAAGTA 
2301 GTTGTAGAAT ACCAGGGACA 
2351 CTTACAGCCC AAGAACTTTG 
2401 GAGCATTTAA TACAACACAG 
2451 CCTGAAAGCC AAAGGAGTCA 
2501 CCTGAGAGTT GAACAGAGCA 



ATGGCGGCGG TGCTCGGGGC GCTCGGGGCG 
GCTGCGAGGC CAGAGCCTAG GGCTAGCGGC 
GGTTGATTGC AGAGGAGAGG AACCAAGCTA 
GGATGGTCGG AATTAAGTGA GAGAGATGCC 
CCACAATTTT AATCAGGCAT TTGGCTTTAT 
CAGAGAAGAT GAATCATCAC CCAGAATGGT 
CAGATAACTC TCACCTCACA TGACTGTGGT 
GAAGCTGGCC AAGTTTATTG AAAAAGCAGC 
CAAAATACAT AAGTCTGAGA GGCTAAACTT 
TCACGTGTAG CACAGTGGAG AAAGCAGGAT 
TGAAGACCTG CGAATGAAGT TGCTAGTTAA 
ATAGGTCTAT GTTATGGGTC GCTGCATCTG 
TATAGAGAAT CAAAGATCCC GTATCCGAAG 
GGTAAATTCC AACAGAATGA AACACCAAAC 
TTCAATTTGA AAGAGATATT GTCAAAATTG 
TGTTCCAAAT CTTTGCATGA TGACAGTGGT 
TGGCTTTCTT CTGTTTTCTT TCTAAAAGAT 
AAACAGACAT GACCTTTGAC CTCTTGCATG 
GGGCCGACAC ATGGTTGACG ATGTCCTTTT 
GTTCTGAAAA TAGTGATTAC TTTCTGACAT 
CTGGATAAAT AGCTTAAGCA TGGCTGTTTA 
AGCAGCAGCA TGTACATTGT ATTTGGACAC 
ATTGAACTAT TGGACCACTA GGGTTAGTAG 
TGATTCAGCA TTCAGAAACA TTCTAGGTGG 
GTAAAGTTAT CGGAAAAACA TCGGGAGGGT 
TTGTGTTTCA ATGCCAGTTA CTCAGGATTA 
AGGACTTCAG GGTCACCAAG CTGCTGCACC 
CGGCTATCTG TGGCTGAGAT GGTGCTGCTT 
TGCTGCTGAA AGGGCACAGG AGATGGCCCT 
GCCTGCCCTG CCCACCAATC CATGAGAAGA 
CCTGGATCAG CTTGTCACCT CTGGTTTCCT 
GCTGGAGTTT CATTTCCAGA CTAAAGTCTT 
GC ATT CATC T GTGGCTGTGC TGATGTAGTA 
TTCTCTGTCA CTTTCAGTGG GCTACCTTCC 
GAAAGAATTC TTTACATTTT TAATCTCTTT 
TGCTTTGGTG CTTAAAGAGA GAAGTCACAA 
TGGAGATGAA ATCCTGTTGT CCCTAGCTAT 
CTGATGCCAT TATTTTGTAT ATTCATACGG 
TCTGTGTGCC CTAGGGGATT GAGCACAGTG 
ACAGATGGAG AGCTGATGCG GGCTGTCTTA 
CCCTGTGGAT GGATGTTGGG CAGGAGCCCT 
CTAAAGGACA TAGCAGGAGT TATAGGAGGA 
ATGAAGACGG AGAAGATGGG GCTAAAGTTT 
ACGGTGATTC TTAAAACTTT GCTGTTGATG 
GTACGTAAGA TGTTATTTCC CAGTTCAGTA 
TTTTCCTACC ATGAATGGTC ATATATACTT 
GCAGAGATGG TGGGGTAGTT ACTTCCTTTT 
GTGTCCAGGA GATTGACCAA TTTAGCCACT 
GGCTACCCAG ATCCCACTGT CCTGATTTGC 
GGAGAAGGTG AGTGGGGTGA ATATATTAAT 
AAAATCCCTA TTACTTTTGT ACTTAAAACA 
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2551 TCTCTGCCAC ATGTGCTCAC TCTTTATATT CTGTTTAGGT GGTTTATATG 
2601 TGCACATCCC ATCCTATGCC TGCAGTTAGC CAACTCAGGG TTTATATTGC 
2651 CTCCTTTCTT TTTTTCTTTT TTTTTTTTTT TTTTAAGAGA TGGGGTCTCG 
2701 TTCTGTCATG CAGACTGGAG TGCAGTGGTG TGATCACAGC TCATTGTAAC 
2751 CTCCAACGCC TGGACTGAAG TGATCCTCCT GCCTTGGCCT CTCTGGTAGC 
2801 TGGGACTACA GGTGCATGCC ACCACACCCA CCTAATTTTT TTTATTTTTA 
2851 TTTTTTGTAG AGACAGTCTC ACTATCTTGC TCGGGCTGGT CCTGAACTCC 
2901 TGGGCTCAAG TTATCTTGCT GCCTCAGCCT CCCATGGGTA ATCTTTATTT 
2951 CCTTTTTTTT TTTTTTTTGG AGATGGAGTT TCGCTCTTGT CGCCCAGGCT 
3001 GGAGTGCAAT GGCACGATCT TGGCTCACTG CAGTCTCCAC CTCCTGGGTT 
3051 CAGGTGATTC TCCATCCTCG GCCTACTGAG TAGCTGAGAT TACAGGCAAC 
3101 TGCCACCATG CGCGGCTAAT TTGTGTATTT TTTTTTAGTA AGAGATGGGG 
3151 TTTCGCCATG TTGGCCGGAC TGGTCTTAGA CTCCTGACCT CAAGCGACCT 
3201 GCCTGCCTTG GCCTCCCAAA GTGCTGGGAT TACAGGCATG AGCCGCTATG 
3251 CCTCGTCGCT GATTTTTATT TCTTATTTTT TTTTTAGAGA TGGGGGTCTC 
3301 ACTATGCTGC TCAGGCTGAT CTCAAACTCC TGGCCTCAAG TGATCCTCCC 
3351 ACCTTAGCCT CCCAAGTTGC TGGGATTATA AGTGTGAGCC ACTATCCCTA 
3401 CCTCACTATT ACCTTCTTTG CTTCTCTTGT TTTCTTTTGT TCTAAGTCAA 
3451 ACCCATCACA ATCTTTTCTT GTCCTTCCAG GTGTTTTCCA GTGCTGTGCC 
3501 CTGGATGTGC TCTCTTTCTC TTAGAGCCCA GAGAACTTGC TTTTCCCCCT 
3551 TATATATGAC CCTTAACTTT TTCTAACACA TTATTAAGGG CCTGTGTCTA 
3601 TCAGCTGGGG GCACTTCTTG AAGGGAGGGC CTTTGTGTGG TCTGTTTCTA 
3651 GTGACTTCCA GCTTTAACCC AGAGCCTCAT GATTGCTGGG TGCCCATAGC 
3701 CTTTTTGCTG AATGGAGGCA CTCAGTCTCC TTGGGAAGAG AGAATCCATG 
3751 AT AG AC CC AC TTGGGAGCTC CCCACTTCAG GGGCCTACAC ACTGGTAATG 
3801 CAACAGAATG CCCAAGAGTG ACCTCATAAA GCAAGGATTC CCTTCGTGGC 
3851 CCCTTCTCTG CTGCCTCTCA GAATCCAGAC GCTAAGGAAA ATCCCTAAGC 
3901 AGAGATTTTC TGTTGGATGC TAAAAGCAAG GAATAAAAGT TGAAAATTTG 
3951 GAAAATGTCT CAACACCGTC ACCAGCGCCA CTCGAGAGTC ATTTCTAGTT 
4001 CACCAGTTGA CACTACATCG GTGGGATTTT GCCCAACATT CAAGAAATTT 
4051 AAGTAAATAT TATCTATCTC CATTGCCTGT TAAGAAATGT GCTAGTAGAA 
4101 GTGTGAGGGC AGGGTGTCAG TGTTCTCTCA GCCTCTTCCC TCAGATACTC 
4151 GTCTGCTTAC CAAAATAAGT TGCATGTCCT TGACAATCTG GTTTCTATGA 
4201 TTGGTGAGGC TGGCATGCTA TTACCTTTAT GTGCCCTGTA GACTTGAATG 
4251 ACCAGTTTGA CCAGTTTGAC TGTTAGATAA TCAGAAGGCT TTTCTCTTTT 
4301 TTTATAATAG ACCCCATCTC AAATCAGATA ATGAAAATTA CATATCTTGA 
4351 TATATTAGAA AAGTATATAC ATTCTGGCTG GGCACGGTGG CTCACGCCTG 
4401 TAATCCCTGC ACTTTGAGAG GCTGGGGCGG ATCACTTGAG GTCAGGAGTT 
4451 TGAGACCGGC CTGGCCAGCG TGGCGAAACC CCATCTCTAC TAAAAATACA 
4501 CAGATTAGCC CGGAGTGATG GTGTGCACCT GTTGTCCCAG CTACTCAGGA 
4551 TGCTGAGGCA GGAGAATCCC TTTAACCTGG GGGGCGAAGG TTGCAGTGAG 
4601 CCAGGATTGC ACCACTGCAC TCCAGCCTGG GTGACGGAAC GGGACTCTGT 
4651 CTCAGAAAAA AAAAAAAAGA AGAGGAAAAA GAAAAATATA TATTCTATAT 
4701 TTTTTTAACT TATGAGAATG TGTTCATTTC ATTTGTAACA TATAATGGGA 
4751 AACAGTAATA CGTACTCTGA GAAAAATTGC AAAGCACAGA TAAATGGAAA 
4801 TAAACAGGAA AAAGAATCAC CTATAACCTC ACCATCCATA G AC AG AC ACT 
4851 GTTAAAATTT TGGCATATTT CCTGCTGATT TTTTCTACTG CTGATTTTTG 
4901 CACAGGTGAG ATAATTTTGA ACAGAGAATT TTGTATCTTT GGTTTTTGTG 
4951 TTTCGCTGCA CACAAAAACA AAAGATATAA AAATGGATCA TAAACATTTT 
5001 TCTAAATCCT GAAAAGTGCA TAGACATATT TTAGTGCCTG TATTTCACAA 
5051 GATGGACATA CCATAATTTA CTTACACAGT CCTTTTTGTT AGATGTTTAA 
5101 GTTGTTTTCA AGCTTCTCAG TGCTGGAAAA AATACTGAGA TAGACATGTT 
5151 TAGTTGAAGT TATTTCATTT CAGGTTATAT TATCTTGGGT CAGAGAATGA 
5201 ATGGTTCTCA GGCTTTTCAA AAGAGCTGGT CAGTTTTTAT GCCTCTGGCA 
5251 GTTTTTGAGA GTGCTCAATC ATACTACACT GTTGCCAGCA TTAGATCTTA 
5301 TCACATTTAA GTCATTGCTA ATTTTATAAA CAAAAACAAT GGTTTTACTT 
5351 TGCATCTCCC TGATTGGTGT TGCTGTAGAA CATATTTGGA GAAGTTTGTT 
5401 TGTCTTTGGT GTTTATTCCA TGAATAGATT GTGTGCCCAT TTTCTCTTGG 
5451 GGTATTCAGT TTTTTATTAC TGATGTGAGC ATGTGTATGG GTGATTATTT 
5501 GATGATTATC AGTTTTGCTT AGTAGACTGG CAATATTTAG TCTTGCTGTC 
5551 ACTGTGTTCC CAGTGCCAAC TAGATTGCTT GATATGTAGT TGCCACTCAA 
5601 TAAAGATTTG TTGAGTCAAT GAAAAAAAAA AAAAAAAAAA A 



BLAST Results 



Entry AC004764 from database EMBL: 

Homo sapiens chromosome 5, PI clone 255g5 (LBNL H61), complete 
sequence. 

Score = 11057, p = 0.0e+00, identities = 2217/2224 
Bp 428-5625 of cDNA == Bp 2912-8107 of AC004764 

Entry HSAC1555 from database EMBL: 

Homo sapiens (subclone l_d8 from BAC H75) DNA sequence, complete 
sequence . 

Score - 575, P = 5.1e-30, identities * 115/115 
Bp -240- 430 of cDNA «*« HSAC1555 splice pattern 
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Medline entries 



93186787: 

Phenylalanine hydroxylase-stimulating protein/pterin-4 
alpha-carbinolamine dehydratase from rat and human liver. 
Purification, characterization, and complete amino acid 
sequence . 

93101632: 

Identity of 4a-carbinolamine dehydratase, a component of 
the phenylalanine hydroxylation system, and DCoH, a 
transregulator of homeodomain proteins. 

95242099: 

Crystal structure of DCoH, a bifunctional, protein-binding 
transcriptional coactivator 



Peptide information for frame 3 



ORF from 21 bp to 410 bp; peptide length: 130 
Category: strong similarity to known protein 



1 MAAVLGALGA TRRLLAALRG QSLGLAAMSS GTHRLIAEER NQAILDLKAA 
51 GWSELSERDA IYKEFSFHNF NQAFGFMSRV ALQAEKMNHH PEWFNVYNKV 
101 QITLTSHDCG ELTKKDVKLA KFIEKAAASV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DK FZph f kd 2_4 6 k 1 9 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_4 6kl9, frame 3 



Report for DKFZphf kd2_46kl9. 3 



[LENGTH] 130 

[MW] 14377.56 

fpl] 9.17 

(HOMOL] PIR:A47189 pterin-4-alpha-carbinolamine dehydratase (EC 4.2.1.96) - rat 4e-34 

[FUNCAT] 01.07.99 other vitamin, cofactor, and prosthetic group activities [S. 

cerevisiae, YHL018w] 5e-04 

[SCOP] dldchg_ 4.38.1.1.1 Pterin-4a-carbinolamine dehydratas 4e-50 

[EC] 4.2.1.96 Tetrahydrobiopterin dehydratase 6e-34 

[PIRKW] nucleus 6e-34 

tPIRKW] carbon-oxygen lyase 6e-34 

[PIRKW] homotetramer 6e-34 

[PIRKW] hydro-lyase 6e-34 

[PIRKW] cytosol 6e-34 

[PIRKW] acetylated amino end 6e-34 

[PIRKW] homodimer 6e-34 

[SUPFAM] pterin-4-alpha-carbinolamine dehydratase 6e-34 

[PROSITE] MYRISTYL 2 

[PROSITE] CK2_PH0SPH0_SITE 3 

[PROSITE] PKC_PHOSPHO_SITE 4 

[KW] Alpha_Beta 

[KW] 3D 

[KW] LOW_COMPLEXITY 14.62 % 

SEQ MAAVLGALGATRRLLAALRGQSLGLAAMSSGTHRLIAEERNQAILDLKAAGWSELSERDA 

SEG .xxxxxxxxxxxxxxxxxxx 

IdehB CCCCHHHHHHHHHHHHHHCCEEECCCCE 



SEQ IYKEFSFHNFNQAFGFMSRVALQAEKMNHHPEWFNVYNKVQITLTSHDCGELTKKDVKLA 

SEG 

IdehB EEEEEECCCHHHHHHHHHHHHHHHHHHCCCCEEEETTTEEEEEECBTTTTBTCCHHHHHH 

SEQ KFIEKAAASV 
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SEG 

ldchB HHHHHHHHHH 



Prosite for DKFZphf kd2_4 6kl9 . 3 



PS00005 


11->14 


PS00005 


32->35 


PS00005 


56->59 


PS00005 


113->116 


PS00006 


56->60 


PS00006 


105->109 


PS00006 


113->117 


PS00008 


6->12 


PS00008 


20->26 



PKC_PHOSPHO SITE 
PKC PHOSPHO~SITE 
PKCTPHOSPHO SITE 
PKC_PHOSPHO~SITE 
CK2_PHOSPHcTsiTE 
CK2_PHOSPHO_SITE 
CK2 PHOSPHO_SITE 
MYRISTYL 
MYRISTYL 



PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphf kd2_46kl9 . 3) 
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DKFZphfkd2_46m4 



group: signal transduction 

DKFZphfkd2_46m4 . 3 encodes a novel 198 amino acid putative GTP-binding protein related to the 
SAR-1 family of Ras superf amily members. 

SARI proteins are involved in vesicular transport between the endoplasmic reticulum and the 
Golgi apparatus. 

The new protein can find clinical application in modulating the transport of vesicles to the 
Golgi Apparatus, thus enabling post-translational modifications of the vesicles contents. 
Blocking of the molecule is expected to result modulation/blocking of secretory pathways. 



nearly identical to mouse GTP-binding protein 
complete cDNA, complete cds, EST hits 
Sequenced by MediGenomix 

Locus: /map= s **438. 9 cR from top of ChrlO linkage group" 
Insert length: 2996 bp 

Poly A stretch at pos . 2969, polyadenylation signal at pos. 2958 



1 ACATCCGGCG AGTAGCTGGC GGTCCCGGGT GCTGCTGGTT AGTGTGCTCT 
51 GAGGGAGGGT CCGAGCCAGC CGCTGTTTTG CCGGAGGAGC CCCTCAGGCC 
101 GTAGTAAGCA TTAATAATGT CTTTCATCTT TGAGTGGATC TACAATGGCT 
151 TCAGCAGTGT GCTCCAGTTC CTAGGACTGT ACAAGAAATC TGGAAAACTT 
201 GTATTCTTAG GTTTGGATAA TGCAGGCAAA ACCACTCTTC TTCACATGCT 
251 CAAAGATGAC AGATTGGGCC AACATGTTCC AACACTACAT CCGACATCAG 
301 AAGAGCTAAC AATTGCTGGA ATGACCTTTA CAACTTTTGA TCTTGGTGGG 
351 CACGAGCAAG CACGTCGCGT TTGGAAAAAT TATCTCCCAG CAATTAATGG 
401 GATTGTCTTT CTGGTGGACT GTGCAGATCA TTCTCGCCTC GTGGAATCCA 
451 AAGTTGAGCT TAATGCTTTA ATGACTGATG AAACAATATC CAATGTGCCA 
501 ATCCTTATCT TGGGTAACAA AATTGACAGA ACAGATGCAA TCAGTGAAGA 
551 AAAACTCCGT GAGATATTTG GGCTTTATGG ACAGACCACA GGAAAGGGGA 
601 ATGTGACCCT GAAGGAGCTG AATGCTCGCC CCATGGAAGT GTTCATGTGC 
651 AGTGTGCTCA AGAGGCAAGG TTACGGCGAG GGTTTCCGCT GGCTCTCCCA 
701 GTATATTGAC TGATGTTTGG ACGGTGAAAA TAAAAGAGTT TTACTTCTCT 
751 GGACTGATCC TATTCACAGC TTCCTCATGA ACTTTTCTAA TAGAACAAGG 
801 ATAGCTCTCC AACCATGTCT GGCGTTGAGA AGCCAAGAGT CTCTGTCAAC 
851 TCTCTCATTG CCCAGTGGTG ACATGTGCTC TTCTCCACAC TGTTGGGAGG 
901 TAATGCTGCC CCACGTGCTG GTGCAGGTCA GTATCCTGGG ACTTGGAAGC 
951 TGGCAGGATT TGCCGGGTAA AGCTGTATGC CATCATGGGG CACCTGAAAA 
1001 GAAAAACACG TCTCACCACT GTGGTTGATT CAAAAGAAAG TGATTCTATT 
1051 TTTTAAAGAA AGCGTTGTTA ATGTAATTGG TATCCCTCCT AACTTTTTGA 
1101 GTTCACAATT TACTTGGTCC AGAGTTTTCT ATTCTTTTTT TTTTTTTAAA 
1151 CTAATGAATG ACATTTAGAT AC TT CAT AAA ATTATGAACA GATATGGAGG 
1201 CCAGAGCTCA TTTGGGTAAA CTTACTCCTG CTGAGTTAGC AGGTTGGTGA 
1251 GAGAAGCTCC CCTGAGCTCA CCTGTCTCTC TGACTGCCTT GGAGTAGGTG 
1301 GCATAACCTT GTGCACAGAG AACTAGAAAA GGGGCAGAAC CCCGGCCTTG 
1351 CAGTTGTGGC AGGTTTCCAC TGTGGTAAGC TAGGTTCATT CCTCATCAAG 
1401 GAATGTGTAG CAGATTGTTC ACTGTGGAGG AGGTAATTAT AGAATGGGTT 
1451 ATTGTTGTTA TTCTTACTCA TGAAGTTACA GATTTTAGCC AGTCTTTGCT 
1501 TTTATACTTT TGTGAAATTT AATTTCTCTC TATAGCACCT TCCTTTTTCG 
1551 TTTTCAGTTA TCAAAAGTGA CTTTGACCTC ATAAGAGAGT TGAGAACATC 
1601 TCTCGTGTCA CATACTGCAG GTGCATCAGT TACTTTTGCA CAGATTCTAG 
1651 GGGGACATTT TTCTGAATAG GAAGACAGGA CAAAGTTAAC AGCTTAAGGG 
1701 CTCTTAATTC TGTGAGTTGA GGACTTAAAA GTATTGTAGC ATTTGTTTGG 
1751 ATCCATGAAA AATGTATTCA GTGGGCTTTA AAATTTCCAT TTGCAGAATT 
1801 TGGTCTCTCA GGCTGTTTGG GAGCTCTTTT TTTTACATTT TTTCTCCTTT 
1851 GACACCTATT TTATTGGTGT TTAAAGTAAA GGTTAACATC TGTAGCTTTT 
1901 CCAGGTTTTT TTTTTTTTTT TTGATATGAA ATTGTCTTTC TCCATTGCAG 
1951 AAATAAGCTA GGGAAACACT AACCCAAAAA CTTTCTGTAG AGCTGTTCCT 
2001 TTGGAGGCAG CATCACTTAT TGGCAGTAAA GACTCAGTAT AAAAGCACCA 
2051 GCATCCCTAC TTGGGTGATG GGGATTAATT TTATAGCATT CCATTTTCCT 
2101 AGTGCCACAT GTGAAATTGG ATTTTGATGA TCTTAATCTA TATTCTACCC 
2151 TTATAATAAA AGATCAAAAG ATATATCTCC TATGAACAGA TTGGAGATAG 
2201 GAGATGAAAA GTTGGGAGGA TGCCTTTATT CTAATGTGAG GGTAGGGAAA 
2251 ATGTGGATAA CATTACTGGG GTGAAGGAGG CATTGTTCTT TAGTTGGAGT 
2301 TCTCATTTTT ATTCTCCAGT ACTGACTTGT GGGGAAAGCA TACTTTTTCA 
2351 CTGCCAGGTA CTGAATGCAG AGGCTCAGTG AAGTATATAT GTGGGAAGTG 
2401 CATGCATTTC GTTTATTAGC AAACATAGCT GGATTAAGAC GAAGTTGTTG 
2451 GTTTGGAAAG GGGTTAAAGC CTTAAGTGAA CAAATCTAGC TAACAGTGAA 
2501 TGAACTAGGT AATATAACTT GCATATTTTT AATTTCCTTT GGTTAAAGGT 
2551 CCCCCATACT TCTCTGTTCG GAGACATGAG AAGTATGATT ACTTCAGTGT 
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2601 TAGTTTTCTT AATTTTTTTT TTCCCCTATT TGTCCCTTGT CACTTTGTTG 

2651 CAAGCTAGAA ATCTGTGGGT TATACATAGG GCAGCTCTTT GCGAAAGTGG 

2701 TTTATTCCAC TGGAGAAAGG GGATTGAAAA TCAGTTAGAA CCAATGTATT 

2751 TCTTGCCCCA CGGAACACTA TTCCTATAAG ATAGCTGAAA GAAGCTGCTG 

2801 TGAGGAGCTC AGCTCCAACA CAGGATCAGC ACCTTGTATA GGAATTCCCA 

2851 TGAATTATGA CTTCTCATTC TGTTTTATCA GAGTGCATAT ATGTCCTACT 

2901 TCAGGAAAAG TAAAACAGTC ATTTACGAAA GAAAGTCAAT CTGTATCCTA 

2951 AGCATTTTAA TAAAAAGTTA AAACAAAAAA AAAAAAAAAA AAAAAA 



BLAST Results 



Entry HS679348 from database EMBL: 
human STS WI-16722. 
Length «= 265 
Minus Strand HSPs: 

Score « 1242 (186.4 bits), Expect - 2.8e-50, P « 2.8e-50 
Identities - 260/265 (98%) 



Medline entries 



94085558: 

Molecular analysis of SARl-related cDNAs from a mouse 
pituitary cell line. 



Peptide information for frame 3 



ORF from 117 bp to 710 bp; peptide length: 198 
Category: 3trong similarity to known protein 



1 MSFIFEWIYN GFSSVLQFLG LYKKSGKLVF LGLDNAGKTT LLHMLKDDRL 

51 GQHVPTLHPT SEELTIAGMT FTTFDLGGHE QARRVWKNYL PAINGIVFLV 

101 DCADHSRLVE SKVELNALMT DETISNVPIL ILGNKIDRTD AISEEKLREI 

151 FGLYGQTTGK GNVTLKELNA RPMEVFMCSV LKRQGYGEGF RWLSQYID 



BLASTP hits 



Entry S39543 from database P1R: 
GTP-binding protein - mouse 
Length - 198 

Score = 1029 (362.2 bits), Expect - 5.1e-104, P = 5.1e-104 
Identities = 197/198 (99%), Positives = 198/198 (100%) 

Entry SARA_MOUSE from database SWISSPROT: 
GTP-BINDING PROTEIN SARA. 
Length - 198 

Score « 1012 (356.2 bits), Expect = 3.2e-102, P = 3.2e-102 
Identities - 195/198 (98%), Positives - 196/198 (98%) 

Entry CEZK180_4 from database TREMBL: 

gene: "ZK180.4"; Caenorhabditis elegans cosraid ZK180. 

Length = 193 

Score = 679 (239.0 bits), Expect = 6.3e-67, P = 6.3e-67 
Identities - 125/197 (63%), Positives = 161/197 (81%) 



Alert BLASTP hits for DKFZphf kd2_4 6m4, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_46m4, frame 3 



Report for DKFZphf kd2_46m4 . 3 



[ LENGTH } 198 

[MW} 22367.00 

[pi] 6-21 

[HOMOL] PIR:S39543 GTP-binding protein - mouse le-112 
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ion, 



[FUNCAT) 
le-58 
f FUNCAT] 
YPL218W] le-58 
[FUNCAT] 
( FUNCAT ) 
palmitylatii 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT) 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
le-04 
[FUNCAT] 

[S 

[FUNCAT] 
[FUNCAT] 
le-04 
[FUNCAT] 
[ BLOCKS ) 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[ BLOCKS ) 
[BLOCKS) 
[BLOCKS] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[SUPFAM) 
[ PROS IT E] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PFAM] 
[KW] 
[KW] 



08.07 vesicular transport (golgi network, etc.) 



[S. cerevisiae, YPL218w] 



30.09 organization of intracellular transport vesicles 



[S. cerevisiae, 



06.10 assembly of protein complexes [S. cerevisiae, YOR094w] 2e-23 

06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) [S. cerevisiae, YPLOSlw] 4e-22 

30.08 organization of golgi [S. cerevisiae, YDL192w] 3e-20 

30.03 organization of cytoplasm [S. cerevisiae, YBR164c] 
03.22 cell cycle control and mitosis [S. cerevisiae, YMR138w) 

30.04 organization of cytoskeleton [S. 
98 classification not yet clear-cut [S. 
30.02 organization of plasma membrane 



cerevisiae, YMR138w] 
cerevisiae, YHR168w] 



3e-19 
2e-09 
2e-09 
7e-05 



[S. cerevisiae, YHR005c] le-04 
30.07 organization of endoplasmatic reticulum [S. cerevisiae, YKL154w] 

03.07 pheromone response, mating-type determination, sex-specific proteins 
evisiae, YHROOSc) le-04 

10.05.07 g-proteins (S. cerevisiae, YHROOSc] le-04 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YKLl54w] 
08.19 cellular import [S. cerevisiae, YMLOOlw] 3e-04 

BL00395A Alanine racemase pyridoxal-phosphate attachment site proteins 

BL01019B ADP-ribosylation factors family proteins 

BL01019A ADP-ribosylation factors family proteins 

BL01020D SARI family proteins 

BL01020C SARI family proteins 

BL01020B SARI family proteins 

BL01020A SARI family proteins 

dlplj 3.25.1.3.1 cH-p21 Ras protein [human (Homo sapiens) 7e-36 

3.10 RaplA [Human (Homo sapiens) 8e-40 
3.5 ADP-ribosylation factor 1 (ARF1) [rat (Rattu 2e-55 
3.4 ADP-ribosylation factor 1 
3.3 (1-54,171-326) Transducin 
3.2 (1-30,152-316) Transducin 



dlguaa 
dlrrf_ 
dlhurb_ 
dlgota2 3.25.1 
dltadb2 3.25.1 



3.25.1. 
3.25.1. 
3.25.1. 



(ARF1) [human (Horn le-58 
(alpha subunit) [ra 2e-33 
(alpha subunit 6e-36 



glycoprotein 4e-19 
monomer le-16 
P-loop 3e-64 
lipoprotein 4e-19 
GTP binding 3e-64 
ADP-ribosylation factor 5e-22 
ATP_GTP_A 1 
MYRISTYL 3 
SARI 1 

CK2_PHOSPHO_SITE 4 
PKC_PHOSPHO_SITE 3 
ASN_GL YCOS YLAT I ON 1 

ADP-ribosylation factors (Arf family) (contains ATP/GTP binding P-loop) 

Alpha_Beta 

3D 



SEQ MSFIFEWIYNGFSSVLQFLGLYKKSGKLVFLGLDNAGKTTLLHMLKDDRLGQHVPTLHPT 

lhurA TTTTTCCCCEEEEEETTTTCHHHHHHHHCCCCEEEEEEETTEE 

SEQ SEELTIAGMTFTTFDLGGHEQARRVWKNYLPAINGIVFLVDCADHSRLVESKVELNALMT 

lhurA EEEEEETTEEEEEEETTTTTTTCCCHHHHHHCEEEEEEEEETTTTTHHHHHHHHHHHHHH 

SEQ DETISNVPILILGNKIDRTDAISEEKLREIFGLYGQTTGKGNVTLKELNARPMEVFMCSV 

lhurA TTTTTTTEEEEEEETTTTTTTCCHHHHHHHHCGG 

SEQ LKRQG YGEGFRWLSQY I D 

lhurA 



Prosite for DKFZphf kd2_4 6m4 . 3 



PS00001 


162->166 


PS00005 


25->28 


PS00005 


158->161 


PS00005 


164->167 


PS00006 


60->64 


PS00006 


72->76 


PS00006 


111->115 


PS00006 


164->168 


PS00008 


32->38 


PS00008 


68->74 


PS00008 


155->161 


PS00017 


32->40 


PS01020 


171->197 



ASN_GLYCOSYLATION 

PKC PHOSPHO SITE 

PKC~PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

ATP__GTP A 

SARI 



PDOC00001 
PDOC00005 
PDOC00005 
PDOC00O05 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00017 
PDOC00782 
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Pfam for DKFZphf kd2_46m4 . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



ADP-ribosylation factors (Arf family) (contains ATP/GTP binding P-loop) 

* GMgW f s I Fr kMWGlWNKEMRI LMLGLDNAGKTT I LYMLKlgEI VTTI PT 
++ FS++++++GL++K++++++LGLDNAGKTT+L+MLK++++ +++PT 
9 -YNGFSSVLQFLGLYKKSGKLVFLGLDNAGKTTLLHMLKDDRLGQHVPT 56 

IGFNVETVeYKNIKFNVWDVGGQdsIRPYWRHYYpNTDGIIWVVDSaDRD 
+++++E++++ +++F+++D+GG++++R++W++Y P+++GI+++VD+AD++ 
57 LHPTSEELTIAGMTFTTFDLGGHEQARRVWKNYLPAINGIVFLVDCADHS 106 

RMeEaKqELHaMLNEEELrDAPlLIFAHKQDLPgAMSesEIREaLGLHel 
R+ E+K+EL+A++++E ++++P+LI++NK+D+ +A+SE+++RE+ GL+ + 
107 RLVESKVELNALMTDETISNVPILILGNKIDRTDAISEEKLREIFGLYGQ 156 



RCn RPWYIQMCCAVtGEGLYEGMDWLSNYInkRkK* 

+++ RP++++MC++++++G++EG++WLS+YI 
157 TTGKGNVTLKELNARPMEVFMCSVLKRQGYGEGFRWLSQYI 



197 
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DKFZphfkd2_47a4 



group: transcription factor 

DKFZphf kd2__4 7a4 . 1 encodes a novel 280 amino acid protein with similarity to zinc finger 
proteins. 

The new protein is a putative transcription factor with one C2H2 zinc fingers. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by this transcription factor. 



similarity to C.elegans F46B6.7 

potential frame shift at 1092, will be checked see BLASTX 
Sequenced by MediGenomix 
Locus: map«"7q31" 
Insert length: 1756 bp 

Poly A stretch at pos. 1737, no polyadenylation signal found 



1 CCCTTTTCTT TTCTGCCGGG TAATGGCTGC TTCCAAGACC CAGGGGGCTG 

51 TCGCCCGAAT GCAGGAAGAC CGTGATGGGA GCTGCAGCAC AGTCGGGGGT 

101 GTAGGTTATG GGGTAAGGAT TGTATCCTGG AGCCGCTTTC CCTGCCAGAA 

151 AGTCCAGGTG GCACCACCAC TTTAGAAGGT TCTCCATCTG TGCCTTGTAT 

201 TTTCTGTGAA GAACATTTTC CTGTGGCTGA ACAAGACAAA CTTCTGAAGC 

251 ACATGATTAT TGAGCATAAG ATTGTCATAG CTGATGTCAA GTTGGTTGCT 

301 GATTTCCAAA GGTACATTTT ATATTGGAGG AAAAGGTTCA CTGAACAGCC 

351 CATCACAGAT TTTTGTAGTG TAATAAGAAT TAATTCCACT GCTCCATTTG 

401 AAGAACAAGA GAATTATTTT TTGTTATGTG ACGTTTTACC AGAAGATAGA 

451 ATTCTTAGAG AAGAGCTTCA GAAACAGAGA CTGAGAGAAA TTCTGGAACA 

501 ACAGCAGCAA GAACGAAATG ATAACAATTT TCATGGCGTT TGTATGTTTT 

551 GCAATGAAGA ATTCCTTGGA AACAGATCTG TTATTTTGAA CCACATGGCC 

601 AGAGAACATG CTTTCAACAT TGGATTGCCA GACAACATTG TAAACTGCAA 

651 TGAATTTTTG TGTACATTAC AGAAAAAGCT TGACAATTTG CAGTGCTTGT 

701 ACTGTGAGAA GACCTTCAGG GGCAAAAATA CACTTAAAGA TCACATGAGG 

751 AAAAAACAGC ATCGTAAGAT TAATCCTAAG AACAGAGAAT ATGACAGATT 

801 TTATGTCATC AATTATTTGG AACTTGGAAA ATCGTGGGAG GAAGTTCAGT 

851 TGGAAGATGA TCGGGAGTTG CTGGACCATC AGGAAGATGA CTGGTCTGAT 

901 TGGGAAGAAC ACCCTGCCTC TGCAGTCTGC TTATTTTGTG AAAAGCAAGC 

951 AGAAACAATT GAGAAGTTGT ATGTCCACAT GGAGGATGCA CACGAATTTG 

1001 ATCTTCTCAA AATAAAGTCA GAACTTGGAT TAAATTTCTA TCAGCAAGTG 

1051 AAACTGGTCA ATTTTATTCG GAGGCAAGTT CACCAATGCA GATGATGGCT 

1101 GCCATGTGAA GTTCAAATCC AAAGCAGACT TAAGAACTCA CATGGAAGAA 

1151 ACTAAACACA CTTCGCTGCT CCCCGATAGA AAGACGTGGG ATCAACTGGA 

1201 GTATTATTTT CCAACCTATG AAAATGACAC TCTCCTGTGT ACACTATCTG 

1251 ACAGTGAAAG TGACCTGACA GCTCAGGAAC AAAATGAAAA TGTTCCCATC 

1301 ATCAGTGAAG ATACATCTAA ACTGTATGCT TTGAAACAAA GCAGTATTTT 

1351 GAACCAGTTG CTACTATAAG AGTACTTGAA AACCTAGAAG AAACTACCAC 

1401 AGAAGCAATT TTTCATGTTT TTCTCCTATG AGACAGATAT GAAAGAACAA 

1451 TTTAAATTTG AACATCAACA AAAGATTGGT CCTTGGTGAA ATAAACTTTT 

1501 CAAAAATGAA TGTTCTTTTC AAAAAATAAA GTAGAAAAAT GCACTTACTA 

1551 AGAACATGAA AAAAAAATGA AGTAGGAAAA TAAGATGAAG ACTTTGTATT 

1601 TTGGCTGTAA AGTTTTATTG TGTGATCATC TTAAATTATC TCACTTCATT 

1651 AAACTCATAA TTATATATAG AAGTATATGT CAATTACAAA GAAATGAAAT 

1701 GTTCAAATTA TTTATAAACC TGATTTTTCA ATCAGCGAAA AAAAAAAAAA 

1751 AAAAAA 



BLAST Results 



Entry AC004112 from database EMBL : 

Homo sapiens BAC clone RG313E03 from 7q31, complete sequence. 
Score « 2660, P = 3.0e-241, identities = 534/535 
> 10 exons 

Entry AC004111 from database EMBL: 

Homo sapiens BAC clone RG103H13 from 7q31, complete sequence. 
Score = 598, P = 5.8e-17, identities « 128/137 
1 exon 



Medline entries 
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No Medline entry 



Peptide information for frame 1 



ORF from 253 bp to 1092 bp; peptide length: 280 
Category: similarity to unknown protein 



1 MIIEHKIVIA DVKLVADFQR YILYWRKRFT EQPITDFCSV IRINSTAPFE 
51 EQENYFLLCD VLPEDRILRE ELQKQRLREI LEQQQQERND NNFHGVCMFC 
101 NEEFLGNRSV ILNHMAREHA FNIGLPDNIV NCNEFLCTLQ KKLDNLQCLY 
151 CEKTFRGKNT LKDHMRKKQH RKINPKNREY DRFYVINYLE LGKSWEEVQL 
201 EDDRELLDHQ EDDWSDWEEH PASAVCLFCE KQAETIEKLY VHMEDAHEFD 
251 LLKIKSELGL NFYQQVKLVN FIRRQVHQCR 

BLASTP hits 

Entry CEF4 6B6_6 from database TREMBLNEW: 

product: "F4 6B6.7 n ; Caenorhabditis elegans cosmid F46B6 

>TREMBL:CEF46B6_6 product: "F4 6B6.7 M ; Caenorhabditis elegans cosmid 

F46B6 

Score = 630, P - l.le-61, identities = 123/289, positives - 183/289 
Entry AF059531_1 from database TREMBLNEW: 

gene: W PRMT3"; product: "protein arginine N-methyl transferase 3"; Homo 

sapiens protein arginine N-methyltransf erase 3 (PRMT3) mRNA, partial 

cds. >TREMBL:AF059531_JL gene: "PRMT3"; product: "protein arginine 

N-methyltransf erase 3"; Homo sapiens protein arginine 

N-methyltransf erase 3 (PRMT3) mRNA, partial cds. 

Score = 120, P = 1.5e-04, identities = 23/78, positives « 42/78 

Entry YB9M_YEAST from database SWISSPROT: 

34.7 KD PROTEIN IN SHM1-MRPL37 INTERGENIC REGION. 

Score « 112, P = 4.6e-04, identities = 43/165, positives = 71/165 



Alert BLASTP hits for DKFZph f kd2_4 7 a 4 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_47a4, frame 1 

Report for DKFZphf kd2_47a4 . 1 

[LENGTH] 280 

[MW] 33921.94 

[pi] 5.63 

[HOMOL] TREMBL: CEF46B6_5 gene: "F46B6.7"; Caenorhabditis elegans cosmid F46B6 le-56 

[BLOCKS] BL01032B Protein phosphatase 2C proteins 

[BLOCKS] BL00028 Zinc finger, C2H2 type, domain proteins 

[PROSITE] MYRISTYL 1 

[PROSITE] ZINC_FINGER_C2H2 1 

[PROSITE] CAMP~PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 3 

[PROSITE] TYR_PHOSPHO_SITE 2 

( PROSITE] PKC_PHOSPHO_SITE 2 

[PROSITE] ASN_GLYCOSYLATION 2 

[PFAM] Zinc finger, C2H2 type 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 8.21 % 

SEQ MIIEHKIVIADVKLVADFQRYILYWRKRFTEQPITDFCSVIRINSTAPFEEQENYFLLCD 

SEG 

PRD cccccceeehhhhhhhhhhhhhhhhhhhhhhhcccceeeeeeccccccchhhhheeeecc 

SEQ VLPEDRILREELQKQRLREILEQQQQERNDNNFHGVCMFCNEEFLGNRSVILNHMAREHA 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccchhhhhhhhhhhhhhhhhhhhhhhhcccceeeeeeccccccccceeeehhhhhhhh 

SEQ FNIGLPDNIVNCNEFLCTLQKKLDNLQCLYCEKTFRGKNTLKDHMRKKQHRKINPKNREY 
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SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



hcccccccccchhhhhhhhhhhhhhhhheeecccccccchhhhhhhhhhhcccccccccc 
DRFYVINYLELGKSWEEVQLEDDRELLDHQEDDWSDWEEHPASAVCLFCEKQAETIEKLY 
ceeeeeeeeccccchhhhhhhhcchhhhhhcccccccccccccccchhhhhhhhhhhhhh 
V HM E D AH E FDL L K I K S E LGLN F YQQV KL VN F I RRQVHQC R 
hhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhcccc 



Prosite for DKFZphf kd2_47a4 . 1 



PS00001 
PS00001 
PS00004 
PS00005 
PS00005 
PS00006 
PS00O06 
PS00O06 
PS00007 
PS00007 
PS00008 
PSO0O28 



44->48 
107->111 

27->31 
154->157 
160->163 
160->164 
194->198 
215->219 
178->185 

13->22 
124->130 
148->171 



ASN_GLYCOSYLATION 

AS N_G L YC OS Y L AT I ON 

CAMP_PHOS PHO_S I TE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

TYR_PHOSPHO~SITE 

TYR PHOSPHO SITE 

MYRISTYL 

ZINC FINGER C2H2 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC000O5 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00028 



Pfam for DKFZphf kd2_47a4 . 1 



HMM_NAME Zinc finger, C2H2 type 

HMM *CpwPDCgKtFrrwsNLrRHMR. .T.H* 

C + C+KTFR + +L+ HMR H 
Query 148 CLY — CEKTFRGKNTLKDHMRKK-QH 



170 
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DKFZphfkd2_4b6 



group: kidney derived 

DKFZphf kd2_4b6 encodes a novel 133 amino acid protein with similarity to Homo sapiens clone 
25003 partial CDS. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of kidney-specific 
genes . 



similarity to Homo sapiens clone 25003 

complete cDNA, complete cds, few EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 1936 bp 

Poly A stretch at pos. 1916, polyadenylation signal at pos . 1890 



1 GGGAGACTTG CAATGAAGTT AGAATGAACA GGAGGAGTCT GCAGCTTTTC 
51 AGTGCCTGGG ATAACTATAG TTTAAAGATC ATTGTGTAAA ATAGGATTTT 
101 TAGTCAGCAT GCATTGTTTT AAACCGACTA ACTGATAGCC TAAAACTTTA 
151 TTTTTGCATT TTGCCAATCC TTGGAGTTTT GTTTTGCAGA ATTAAGAAAA 
201 AAATGAATGT ATGATCATCT GAAAAGGGCT TTCTCTCAAT CCCACTTCAT 
251 GGCATGACCT CTGCTGGATC ATTAGTTCTA GCCAGAGAAG TAGCAAAGGA 
301 ACATGACGTC TGAGACCTCC CTTCCCTCAT CAGTGGGGCT GACTGAGCTG 
351 GGGGCTTGAA GCCGGAGGTA ACCTTTCCTG TCGAATGTTT CTTTAGAGAA 
, 401 TGGCAATGGT CTCTGCGATG TCCTGGGTCC TGTATTTGTG GATAAGTGCT 
451 TGTGCAATGC TACTCTGCCA TGGATCCCTT CAGCACACTT TCCAGCAGCA 
501 TCACCTGCAC AGACCAGAAG GAGGGACGTG TGAAGTGATA GCAGCACACC 
551 GATGTTGCAA CAAGAATCGC ATTGAGGAGC GGTCACAAAC AGTAAAGTGT 
601 TCCTGTCTAC CTGGAAAAGT GGCTGGAACA ACAAGAAACC GGCCTTCTTG 
651 CGTCGATGCC TCCATAGTGA TTTGGAAATG GTGGTGTGAG ATGGAGCCTT 
701 GCCTAGAAGG AGAAGAATGT AAGACACTCC CTGACAATTC TGGATGGATG 
751 TGCGCAACAG GCAACAAAAT TAAGACCACG AGAATTCACC CAAGAACCTA 
801 ACAGAAGCAT TTGTGGTAGT AAAGGAAAAC CAACCCTCTG GAAAATACAT 
851 TTTGAGAATC TCAAACATCT CACATATATA CAAGCCAAAT GGATTTCTTA 
901 CTTGCACTTT GACTGGCTAC CAGATAATCA CAGTGCGTTT AGTGTGTGTA 
951 ACGAAATATC CTACAGTGAG AAGACACAGC GTTTTGGCAT CACCATGGAA 
1001 AGTGGGCTTA AAAAAGGGTC TTCTCAGTGA AATTTTTGGG CATCATGAAG 
1051 AACGATCAAC TATCTTCTAA TTTGAATCTA TAGTTACTTT GTACCATTTG 
1101 AAATATATGT ATATATATAT ATATAATATT TTGAAATATT ATCTATTCTC 
1151 TTCAAGAAAT GAACAGTACC ACAGTTTGAG ACGGCTGGTG TACCCCTTTG 
1201 AGTTTTGGAT GTTTTGTCTG TTTTGCTTTG TTTTGTTAGT CATTTCTTTT 
1251 TCTAACGGCA AGGAAGATAT GTGCCCTTTT GAGAATTCAA GATGGCACTG 
1301 ACACGGGAAG GCCAGCTACA GGTGGACTCC TGGAATTTGA GGCATCATAA 
1351 TGATACTGAA TCAAGAACTT CCTTCTGCTT CTACCAGATG GCCCAAGGAA 
1401 GCACATCGTC CTGTTTTATT GCTTTCTACC CTGTGCAATA TTAGCATGCA 
1451 AGCTTGGCTT ACATAGTCAT ACTTTATATT CAATTGATAT ATAATAACCG 
1501 TTCTAACCTC TTCCAGGAAA ATATTTTTAG AACTACTAGC TTTTCCACTT 
1551 AGAAGAAAAT GAGGATTCTT AAGGGAGCCA CTCCACCATG CTATTAAGAC 
1601 TCTGGCAGAG TTATGGGTAG GATATGGATC CCTACATGAA TAAGTCCTGT 
1651 AAATACAATG TCTTAAGGCT TTGTATAGCT GTCCTAGACT GCAGAAATGT 
1701 CCTCTGATTA AATCCAAAGT CTGGCATCGT TAACTACATA GTGCTGTAGC 
1751 AACAAGTCTT ATCATGGCAT CTCTTTCTAT GTTTGGTTTG CTTTTTCCAA 
1801 GAGTATTCAG GTCTCCTCTT GTGAGATAGG AAGGCCATGA AAACAATTAG 
1851 ATTTCAAGAT GATCTATGTG ACCAAATGTT GGACAGCCCT ATTAAAGTGG 
1901 TAAACAACTT CTTTCTAAAA AAAAAAAAAA AAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
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Peptide information for frame 1 



ORF from 400 bp to 798 bp; peptide length: 133 
Category: similarity to unknown protein 
Classification: no clue 

1 MAMVSAMSWV LYLWISACAM LLCHGSLQHT FQQHHLHRPE GGTCEVIAAH 
51 RCCNKNRIEE RSQTVKCSCL PGKVAGTTRN RPSCVDASIV IWKWWCEMEP 
101 CLEGEECKTL PDNSGWMCAT GNKIKTTRIH PRT 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4b6, frame 1 

TREMBLNEW:AF131651_1 product: "Unknown"; Homo sapiens clone 25003 mRNA 
sequence, partial cds., N « 1, Score ~ 242, P = 1.7e-20 

>TREMBLNEW:AF131851_1 product: "Unknown"; Homo sapiens clone 25003 mRNA 
sequence, partial cds. 
Length =165 

HSPs: 

Score «= 242 (36.3 bits), Expect = 1.7e-20, P - 1.7e-20 
Identities « 44/89 (49%), Positives « 58/89 (65%) 



Query: 42 GTCEVIAAHRCCNKNRIEERSQTVKCSCLPGKVAGTTRNRPSCVDASIVIWKWWCEMEPC 101 

GTCE++ R ++ R QT +C+C G++AGTTR RP+CVDA 1+ K WC+M PC 

Sbjct: 76 GTCEIVTLDRDSSQPRRTIARQTARCACRKGQIAGTTRARPACVDARIIKTKQWCDMLPC 135 

Query: 102 LEGEECKTLPDNSGWMCAT-GNKIKTTRI 129 

LEGE C L + SGW C G +IKTT + 
Sbjct: 136 LEGEGCDLLINRSGWTCTQPGGRIKTTTV 164 

Pedant information for DKFZphf kd2_4b6, frame 1 



Report for DKFZphf kd2_4b6. 1 

[LENGTH] 133 

[MW] 15030.64 

[pi] 8.49 

[HOMOL] TREMBLNEW:AF131851_1 product: "Unknown"; Homo sapiens clone 25003 mRNA 

sequence, partial cds. 4e-20 

[KW] Alpha_Beta 

[KW] SIGNAL_PEPTIDE 26 



SEQ MAMVSAMSWVLYLWISACAMLLCHGSLQHTFQQHHLHRPEGGTCEVIAAHRCCNKNRIEE 

PRD ccchhhhhhhhhhhhhhhhhhhhccccchhhhhhhcccccccceeeeeeecccccchhhh 

SEQ RSQTVKCSCLPGKVAGTTRNRPSCVDASIVIWKWWCEMEPCLEGEECKTLPDNSGWMCAT 

PRD hhhhhhccccccccccccccccccceeeeeehhhhhhccccccccceeeecccccceeec 

SEQ GNKIKTTRIHPRT 

PRD ccccccccccccc 



(No Prosite data available for DKFZphf kd2_4b6 . 1 ) 
(No Pfam data available for DKFZphf kd2_4b6. 1) 
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DKFZphfkd2_4c8 
group: kidney derived 

DKFZphf kd2_4c8 encodes a novel 153 amino acid protein with partial similarity to huntington's 
associated protein HAP1. 

The novel protein contains a leucine zipper involved in protein-protein interaction. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of kidney-specific 
genes . 

similarity to KIAA0549 and HAP1 

potential frame shift at Bp -1350-1500 will be checked 

Sequenced by GBF 

Locus : unknown 

Insert length: 3182 bp 

Poly A stretch at pos. 3162, polyadenylation signal at pos. 3135 

1 GGGCTTCCCC CATAGAATTT TTCTTTTCAT TGCCCACTTT ACTGTTTTGG 

51 CTCCAGACTG TCGTTAAGAA TGTACAGCCT AATTCTGGTG TGTTTCGGGA 

101 TATTCTTCTG TCCAGTATTC TGGAAGGGCG GGGAGGCATG GCAGCGTTTT 

151 ACTTGACGTT GATGGTGCTG TGAAGTCCAT TCTTTCCTCT GCAAGACTAC 

201 TGACTATGCA GAAATTTATC GAAGCGGATT ATTATGAACT AGACTGGTAT 

251 TATGAAGAAT GCTCGGATGT TTTATGTGCT GAAAGAGTTG GCCAGATGAC 

301 TAAGACATAT AATGACATAG ATGCTGTCAC TCGGCTTCTT GAGGAGAAAG 

351 AGCGGGATTT AGAATTGGCC GCTCGCATCG GCCAGTCGTT GTTGAAGAAG 

401 AACAAGACCC TAACCGAGAG GAACGAGCTG CTGGAGGAGC AGGTGGAACA 

451 CATCAGGGAG GAGGTGTCTC AGCTCCGGCA TGAGCTGTCC ATGAAGGATG 

501 AGCTGCTTCA GTTCTACACC AGCGCAGCGG AGGAGAGTGA GCCCGAGTCC 

551 GTTTGCTCAA CCCCGTTGAA GAGGAATGAG TCGTCCTCCT CAGTCCAGAA 

601 TTACTTTCAT TTGGATTCTC TTCAAAAGAA GCTGAAAGAC CTTGAAGAGG 

651 AGAATGTTGT ACTTCGATCC GAGGCCAGCC AGCTGAAGAC AGAGACCATC 

701 ACCTATGAGG AGAAGGAGCA GCAGCTGGTC AATGACTGCG TGAAGGAGCT 

751 GAGGGATGCC AATGTCCAGA TTGCTAGTAT CTCAGAGGAA CTGGCCAAGA 

801 AGACGGAAGA TGCTGCCCGC CAGCAAGAGG AGATCACACA CCTGCTATCG 

851 CAAATAGTTG ATTTGCAGAA AAAGGCAAAA GCTTGCGCAG TGGAAAATGA 

901 AGAACTTGTC CAGCATCTGG GGGCTGCTAA GGATGCCCAG CGGCAGCTCA 

951 CAGCCGAGCT GCGTGAGCTG GAGGACAAGT ACGCAGAGTG CATGGAGATG 

1001 CTGCATGAGG CGCAGGAGGA GCTGAAGAAC CTCCGGAACA AAACCATGCC 

1051 CAATACCACG TCTCGGCGCT ACCACTCACT GGGCCTGTTT CCCATGGATT 

1101 CCTTGGCAGC AGAGATTGAG GGAACGATGC GCAAGGAGCT GCAGTTGGAA 

1151 GAGGCCGAGT CTCCAGACAT CACTCACCAG AAGCGTGTCT TTGAGACAGT 

1201 AAGAAACATC AACCAGGTTG TCAAGCAGAG ATCTCTGACC CCTTCTCCCA 

1251 TGAACATCCC CGGCTCCAAC CAGTCCTCGG CCATGAACTC CCTCCTGTCC 

1301 AGCTGCGTCA GCACCCCCCG GTCCAGCTTC TACGGCAGCG ACATAGGCAA 
1351 CGTCGTCCTC GACAACAAGA CCAACAGCAT CATTCTGGAA ACAGAGGCAG 

1401 CCGACCTGGG AAACGATGAG CGGAGTAAGA AGCCGGGGAC GCCGGGCACC 

14 51 CCCAGGCTCC CACGACCTGG AGACGGCGCT GAGGCGGCTG TCCCTGCGCC 
1501 GGGAGAACTA CCTCTCGGAG AGGAGGTTCT TTGAGGAGGA GCAAGAGAGG 

1551 AAGCTCCAGG AGCTGGCGGA GAAGGGCGAG CTGCGCAGCG GCTCCCTCAC 

1601 ACCCACTGAG AGCATCATGT CCCTGGGCAC GCACTCCCGC TTCTCCGAGT 
1651 TCACCGGCTT CTCTGGCATG TCCTTCAGCA GCCGCTCCTA CCTGCCTGAG 

1701 AAGCTCCAGA TCGTGAAGCC GCTGGAAGGT GATCACGCGG GGCCTCGGCC 

17 51 CCTCTCTGTC CTCCTGGGGG ACTCCCTTTG GTCCCTGATC CACCTGCGGA 

1801 AGGCGGGGCA CCTCTGTCAC GCCTACTCCT TTTTCTTCCG CGACAGCCAC 

1851 CCGCGCTGCT GGTTTGAGTT CCTCTGAGGG TGGTGCTCAG CCTAGGCCTC 
1901 CGTCCCTCCC CTCTGGCTGG CAGGTGTGAC AATGCACACA TAGGCCATGA 

1951 AACTCGCCGA GGAAAGACAA GCATGTGCAC TGTGGTCTTC TAGTTCTTTC 

2001 CTTTGCCTTT AGAACCTTAG AAATAAAAAC TTTTGTGGCG GTAGAGGCAC 

2051 TGCTAACTGA TTCAAAAATT AATTAGGTTT TGCCTGTGGG TGTGAGGAAT 

2101 GCAGAAAATT AATGCTTTAG CTTTTCTGCA GTTTTGGTGT CGGGGAGAGG 

2151 TTCCAAGCAA ACTCTATTAA ATGGGGATTT TTTTTTCCCC ATAACCACCT 

2201 GAATGTGATT TGTGGGCTTA TGTGTTCTGA TTTGAACTTC ATATAGCAAG 

2251 GTTGTGGCTT TTGGCAGATG CAGTATGTTC TGAGCGCGGC TCCTAGAGTC 

2301 TACAATTTGG AGTCCAGGAA GGGGTGGCTG TGGAGACAAG TGAGTTTTGT 

2351 ACCTCCGTAA GCCACCCTTT TTCAGGGTCA GTTCATGTGT TAGTATCAGG 

2401 GGCATCTCAG ATGATTAAAC TCATGGGAAA AACTTCCTCC TTCCCTCTCT 

2451 CCCTCTTGCC CTCCTGCCTC TTTTTTTTTT TTTTTTTTTT AATTTGGGCA 

2501 CTTATAAAAT GTTTTCCCTC TACCTGCTGC TACTCTGCCA AGAGCCACCA 

2551 AGTGCTTATA TTTTTCATTT TTTACTCCTT TAGTTTGGAA AGCCATATAC 

2601 GTTTGAGAAG GTGTTTTAAA ACTCTGTGTT ACACTTACGA TGCAAAGCCA 

2651 AATCAGAACT TCTGTAAGGC AGAACTTTCC CAACTTTAAA AAAATTATTG 
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2701 TCCCCTCTAG GAGCCTTCTT AGACGTTTTT TCCTAATCAC CCCCCAAAGA 
2751 CATTTTAATA CCACATATAT ATTGTTTATG TACTATATGT ATATACATAA 
2801 ACAATACATA AGCAATACAT CTGTGGTATT AAAATTAAAA AGAATCCAAT 
2851 TATGTTTACC TCAAAAGAAC CTGTTTTTGC TTCTTGGGAG CAATATTGCC 
2901 CCTGTGAGAC TGCATGCTAT AAGGTAAGGT TGTGCTTGTT AAAGACCCAA 
2951 GACATGACTG GGTTCCACAG TCTCCAAAGG AAGAGGGTGG GCTAGTTTGT 
3001 TTTTATTATT ATTTTAAAAT TGTATAATTG GGGTCTTTCT TAGAGTTCAG 
3051 AAAAGGTATA GCTTACTCTT TTTTAATTGT TTATTTAGTT GTAAGCTTAG 
3101 TGATTGTTTT CTGATCCACA TTGTGTGTGT TCTTCAATAA AATCTTTCAT 
3151 TTCTGCAATT TTAAAAAAAA "AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 206 bp to 1531 bp; peptide length: 4 42 
Category: similarity to known protein 
Classification: unset 

Prosite motifs: LEUCINE ZIPPER (139-161) 



1 MQKFIEADYY ELDWYYEECS DVLCAERVGQ MTKTYNDIDA VTRLLEEKER 
51 DLELAARIGQ SLLKKNKTLT ERNELLEEQV EHIREEVSQL RHELSMKDEL 
101 LQFYTSAAEE SEPESVCSTP LKRNESSSSV QNYFHLDSLQ KKLKDLEEEN 
151 VVLRSEASQL KTETITYEEK EQQLVNDCVK ELRDANVQIA SISEELAKKT 
201 EDAARQQEEI THLLSQIVDL QKKAKACAVE NEELVQHLGA AKDAQRQLTA 
251 ELRELEDKYA ECMEMLHEAQ EELKNLRNKT MPNTTSRRYH SLGLFPMDSL 
301 AAEIEGTMRK ELQLEEAESP DITHQKRVFE TVRNINQVVK QRSLTPSPMN 
351 IPGSNQSSAM NSLLSSCVST PRSSFYGSDI GNWLDNKTN SIILETEAAD 
401 LGNDERSKKP GTPGTPRLPR PGDGAEAAVP APGELPLGEE VL 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4c8, frame 2 

PIR:S72555 huntingtin-associated protein HAPl - human (fragment), N = 
1, Score = 234, P = 8.6e-19 

TREMBL : CEUT27A3_7 gene: "T27A3.1"; Caenorhabditis elegans cosmid 
T27A3., N - 1, Score » 226, P - 9.9e-16 

PIR:S67495 huntingtin-associated protein HAPl-A - rat, N « 1, Score = 
215, P - 1.6e-14 



>PIR:S72555 huntingtin-associated protein HAPl - human (fragment) 
Length * 320 

HSPs: 



Score - 234 (35.1 bits), Expect •= 8.6e-19, P 8.6e-19 
Identities = 66/189 (34%), Positives « 110/189 (58%) 



Query: 


109 


EESEPESVCSTPLKRNE — SSSSVQNYFH LDSLQKKLKDLEEENVVLRSEASQLKTE 


163 




EE+E + C+ P + S ++ + H L++LQ+KL+ LEEEN LR EASQL T 




Sbjct: 


28 


EEAEEDLQCAHPCDAPKLISQEALLHQHHCPQLEALQEKLRLLEEENHQLREEASQLDT- 


86 


Query: 


164 


TITYEEKEQQLVNDCVKELRDANVQIASISEELAKKTEDAARQQEEITHLLSQIVDLQKK 


223 




E++EQ L+ +CV++ +A+ Q+A +SE L + E+ RQQ+E+ L +Q++ LQ++ 




Sbjct: 


87 


— LEDEEQMLILECVEQFSEASQQMAELSEVLVLRLENYERQQQEVARLQAQVLKLQQR 


143 


Query: 


224 


AKACAVENEELVQHLGAAKDAQRQLTAE—LRELEDKYAECME— MLHEAQEELKNL-RN 


278 




+ E E+L +L+K+QQLEL ++AE+ + +++RN 
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Sbjct: 144 CRMYGAETEKLQKQLASEKEIQMQLQEEETLPGFQETLAEELRTSLRRMISDPVYFMERN 203 

Query: 279 KTMP--NTTSRRY 289 

MP +T+S RY 
Sbjct: 204 YEMPRGDTSSLRY 216 



Peptide information for frame 3 



ORF from 1416 bp to 1874 bp; peptide length: 153 
Category: similarity to known protein 
Classification: unset 



1 MSGVRSRGRR APPGSHDLET ALRRLSLRRE NYLSERRFFE EEQERKLQEL 
51 AEKGELRSGS LTPTESIMSL GTHSRFSEFT GFSGMSFSSR SYLPEKLQIV 
101 KPLEGDHAGP RPLSVLLGDS LWSLIHLRKA GHLCHAYSFF FRDSHPRCWF 
151 EFL 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4c8, frame 3 

TREMBL:AB011121_1 gene: "KIAA0549"; product: "KIAA0549 protein"; Homo 
sapiens mRNA for KIAA0549 protein, partial cds., N » 1, Score = 252, P 
= 5.5e-21 



>TREMBL:AB011121_1 gene: "KIAA0549"; product: "KIAA0549 protein"; Homo 
sapiens mRNA for KIAA0549 protein, partial cds. 
Length - 469 

HSPs: 

Score - 252 (37.8 bits), Expect - 5.5e-21, P « 5.5e-21 
Identities - 57/98 (58%) , Positives = 69/98 (70%) 

Query: 8 GRRAPPGSHDLETALRRLSLRRENYLSERRFFEEEQERKLQELAEKGELRSGSLTPTESI 67 

G+ P G DL TAL RLSLRR+NYLSE++FF EE +RK+Q LA++ E SG +TPTES+ 
SbjCt: 27 GQPGPSGDSDLATALHRLSLRRQNYLSEKQFFAEEWQRKIQVLADQKEGVSGCVTPTESL 86 

Query: 68 MSLGTHSRFSEFTGFSGMSFSSRSYLPEKLQIVKPLEG 105 

SL T SE T S S R ++PEKLQIVKPLEG 
Sbjct: 87 ASLCTTQ— SEITDLSSAS-CLRGFMPEKLQIVKPLEG 121 



Pedant information for DKFZphf kd2_4c8, frame 2 



Report for DKFZphf kd2_4c8 . 2 



I LENGTH J 
[MW] 
[pi] 
[HOMOL] 
cds. 5e-29 
[FUNCAT] 
5e-0B 
[FUNCAT] 
[FUNCAT] 
I FUNCAT] 
6e-08 
[FUNCAT J 
[ FUNCAT ] 
[ FUNCAT J 
[FUNCAT] 
jannaschii 
[FUNCAT] 
myosin-1 
[FUNCAT] 
[FUNCAT] 
repair) 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
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50020.14 
4.77 

TREMBL: AF040723 1 product: 



"neuroanl"; Homo sapiens neuroanl mRNA, complete 



08.07 vesicular transport (golgi network, etc.) 



[S. cerevisiae, YDL058w] 



30.04 organization of cytoskeleton [S. cerevisiae, YILl49c] 5e-08 

30.03 organization of cytoplasm [S. cerevisiae, YDL058w] 5e-08 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YILl38c] 

99 unclassified proteins [S. cerevisiae, YGRl30c] 2e-07 

09.10 nuclear biogenesis [S. cerevisiae, YDR356w] le-06 

03.22 cell cycle control and mitosis [S. cerevisiae, YDR356w] le-06 
1 genome replication, transcription, recombination and repair [M. 
MJ1643] le-06 

08.22 cytoskeleton-dependent transport [S. cerevisiae, YHR023w MYOl - 

isoform] 3e-06 

03.25 cytokinesis [S. cerevisiae, YHR023w MYOl - myosin-1 isoform] 3e-06 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YKR095w] 4e-06 

30.10 nuclear organization [S. cerevisiae, YKR095wJ 4e-06 
03.13 meiosis (S. cerevisiae, YNL250w] 2e-05 

03.19 recombination and dna repair [S. cerevisiae, YNL250w] 2e-05 
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[ FUNCAT] 08.99 other intracellular-transport activities [S. cerevisiae, YNL079c] 

5e-05 

[FUNCAT] 03.01 cell growth [S. cerevisiae, YNL079cJ 5e-05 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YNL079C] 5e-05 

[FUNCAT] 10.05.99 other pheromone response activities [S. cerevisiae, YHRl58c] 

le-04 

[FUNCAT] 30.13 organization of chromosome structure [S. cerevisiae, YDR285w] le-04 

[FUNCAT] 30.09 organization of intracellular transport vesicles [S. cerevisiae, 

YNL272C] 3e-04 

[FUNCAT] 08.16 extracellular transport (S. cerevisiae, YNL272c] 3e-04 

[BLOCKS] BL01289B 

[BLOCKS] BL00415M Synapsins proteins 

[EC] 3.6.1.32 Myosin ATPase 2e-07 

[PIRKW] tandem repeat 2e-07 

[PIRKW] heterodimer le-06 

[PIRKW] endocytosis 9e-07 

[PIRKW] heart le-06 

[PIRKW] transmembrane protein 4e-07 

[PIRKW] zinc finger 9e-07 

[PIRKW] metal binding 9e-07 

[ PIRKW] DNA binding 3e-06 

[PIRKW] muscle contraction 2e-07 

[PIRKW] acetylated amino end 3e-06 

[PIRKW] actin binding 2e-07 

[PIRKW] mitosis le-06 

[PIRKW] microtubule binding le-06 

[PIRKW] ATP 2e-07 

[PIRKW] chromosomal protein le-06 

[PIRKW] receptor 3e-08 

[PIRKW] thick filament 2e-07 

[PIRKW] phosphoprotein 8e-06 

[PIRKW] glycoprotein 3e-08 

[PIRKW] skeletal muscle 3e-06 

[PIRKW] DNA condensation le-06 

[PIRKW] alternative splicing 2e-06 

[PIRKW] coiled coil 2e-07 

[PIRKW] P-loop 2e-07 

[PIRKW] heptad repeat 4e-07 

[PIRKW] methylated amino acid 2e-07 

[PIRKW] peripheral membrane protein 9e-07 

[PIRKW] cardiac muscle 6e-06 

[PIRKW] hydrolase 2e-07 

[PIRKW] muscle 2e-06 

[PIRKW] cytoskeleton 2e-06 

[PIRKW] Golgi apparatus 4e-07 

[PIRKW] calmodulin binding 9e-07 

[SUPFAM] myosin motor domain homology 2e-07 

[SUPFAM] tropomyosin TPM1 2e-06 

[SUPFAM] giantin 4e-07 

[SUPFAM] protein kinase C zinc-binding repeat homology 2e-06 

[SUPFAM] human early endosome antigen 1 9e-07 

[SUPFAM] unassigned kinesin-related proteins 4e-07 

[SUPFAM] M5 protein 8e-08 

[SUPFAM] cytoskeletal keratin 3e-06 

[SUPFAM] myosin heavy chain 2e-07 

[SUPFAM] conserved hypothetical P115 protein le-06 

[SUPFAM] centromere protein E le-06 

[SUPFAM] pleckstrin repeat homology 2e-06 

[SUPFAM] kinesin motor domain homology 4e-07 

[PROSITE] LEUCINE_ZIPPER 1 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 6.79 % 

[KW] COILED_COIL 27.15 % 



SEQ MQKFIEADYYELDWYYEECSDVLCAERVGQMTKTYNDIDAVTRLLEEKERDLELAARIGQ 

SEG xxxxxxxxxxxxxxx. . . 

PRO ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS C 

SEQ SLLKKNKTLTERNELLEEQVEHIREEVSQLRHELSMKDELLQFYTSAAEESEPESVCSTP 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ LKRNESSSSVQNYFHLDSLQKKLKDLEEENWLRSEASQLKTETITYEEKEQQLVNDCVK 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
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SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 



ELRDANVQIASISEELAKKTEDAARQQEEITHLLSQIVDLQKKAKACAVENEELVQHLGA 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
CCCCCCCCCCC 

AKDAQRQLTAELRELEDKYAECMEMLHEAQEELKNLRNKTMPNTTSRRYHSLGLFPMDSL 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

AAEIEGTMRKELQLEEAESPDITHQKRVFETVRNINQVVKQRSLTPSPMNIPGSNQSSAM 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhh 



NSLLSSCVSTPRSSFYGSDIGNVVLDNKTNSIILETEAADLGNDERSKKPGTPGTPRLPR 

xxxxxxxxxxx 

hhhhhcccccccccccccccceeeeeccccceeecccccccccccccccccccccccccc 



PGDG AEAAVP A PGEL PLGEEVL 

xxxx 

cccccccccccccccccccccc 



PS0OO29 



Prosite for DKFZphf kd2_4c8 .2 
139->161 LEUCINE ZIPPER PDOC00029 



(No Pfam data available for DKFZphf kd2_4c8 . 2) 

Pedant information for DKFZphf kd2_4c8 , frame 3 
Report for DKFZphf kd2_4c8. 3 



[LENGTH] 153 

[MW] 17642.03 

[pi] 9.38 

[HOMOL] TREMBL:AB011121_1 gene: "KIAA0549"; product: "KIAA0549 protein"; Homo sapiens 

raRNA for KIAA0549 protein, partial cds. 2e-12 



[KW] 
[KWJ 



Alpha_Beta 
LOW COMPLEXITY 



12.42 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MSGVRSRGRRAPPGSHDLETALRRLSLRRENYLSERRFFEEEQERKLQELAEKGELRSGS 

xxxxxxxxxxxxxxxxxxx 

cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccc 

LTPTESIMSLGTHSRFSEFTGFSGMSFSSRSYLPEKLQIVKPLEGDHAGPRPLSVLLGDS 

cccccceeeccccceeeccccccccccccccccchhhhhhhhcccccccccceeeeeccc 

LWSLIHLRKAGHLCHAYSFFFRDSHPRCWFEFL 

chhhhhhhhhcccccceeeeecccccccccccc 



{No Prosite data available for DKFZphf kd2_4c8 . 3) 
(No Pfam data available for DKFZphf kd2_4c8 . 3) 
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DKFZphfkd2_4kl4 



group: intracellular transport and trafficking 

DKFZphfkd2_4kl4 .3 encodes a novel 254 amino acid putative GTP-binding protein nearly identical 
to Rab 6. 

Rab proteins are members of the Ras superfamily of GTPases. Rab proteins are localised to the 
cytoplasmic side of organelles and vesicles involved in the secretory (biosynthetic) and 
endocytotic pathways in eukaryotic cells. Rab proteins direct the targeting and fusion of 
transport vesicles to their acceptor membranes. 

rab6 is a ubiquitous ras-like GTPase involved in intra-Golgi transport. 

The new protein can find application in modulating the transport of vesicles inside the Golgi 
apparatus. 



strong similarity to Rab6 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 3084 bp 

Poly A stretch at pos. 3061, polyadenylation signal at pos. 3043 



1 GGGGCACTCA GCAGGTTGGG CTGCGGCGGC GGCGGCTGGG GAAGCCGAAG 
51 CGCCGCGCGT GAGAGATCCC GGATACATCT GCGGTTTGGG CTCCGCCACC 
101 CTCCGTCTCT CTCCCGCAGG TCTCTGAGCC GGGTGCGGAA GGAGGGAACG 
151 GCCCTAGCCT TGGGAAGCCA AAGCACACCC CTGGCTCCCG CCGACACCGC 
201 CCTCCTTCCC TTCCCAGCCG CGGGCCTCGC TCCGTGCTCG GCTACTCTGC 
251 CGGGAGGCGG CGGCGGCTGC CAGTCTGTGG CGAGCCCTGC TGCCCTCCAG 
301 CCGGGCTTCT CCAGCCGGGC TCCTCCACCG GCCCTTGCAG GGGCACAGAG 
351 AGCTCGGCGC CCGCCCTTCC GCTCGCCTTT TTCGTCAGCC GGCTGGAGGA 
401 GCATCGGTCC GGGAGGTCTC TGGGCTGAGG CGGCGACAGC TCCTCTAGTT 
451 CCACCATGTC CGCGGGCGGA GACTTCGGGA ATCCGCTGAG GAAATTCAAG 
501 CTGGTGTTCC TGGGGGAGCA AAGCGTTGCA AAGACATCTT TGATCACCAG 
551 ATTCAGGTAT GACAGTTTTG ACAACACCTA TCAGGCAATA ATTGGCATTG 
601 ACTTTTTATC AAAAACTATG TACTTGGAGG ATGGAACAAT CGGGCTTCGG 
651 CTGTGGGATA CGGCGGGTCA GGAACGTCTC CGTAGCCTCA TTCCCAGGTA 
701 CATCCGTGAT TCTGCTGCAG CTGTAGTAGT TTACGATATC ACAAATGTTA 
751 ACTCATTCCA GCAAACTACA AAGTGGATTG ATGATGTCAG AACAGAAAGA 
601 GGAAGTGATG TTATCATCAC GCTAGTAGGA AATAGAACAG ATCTTGCTGA 
851 CAAGAGGCAA GTGTCAGTTG AGGAGGGAGA GAGGAAAGCC AAAGGGCTGA 
901 ATGTTACGTT TATTGAAACT AGGGCAAAAA CTGGATACAA TGTAAAGCAG 
951 CTCTTTCGAC GTGTAGCAGC AGCTTTGCCG GGAATGGAAA GCACACAGGA 
1001 CGGAAGCAGA GAAGACATGA GTGACATAAA ACTGGAAAAG CCTCAGGAGC 
1051 AAACAGTCAG CGAAGGGGGT TGTTCCTGCT ACTCTCCCAT GTCATCTTCA 
1101 ACCCTTCCTC AGAAGCCCCC TTACTCTTTC ATTGACTGCA GTGTGAATAT 
1151 TGGCTTGAAC CTTTTCCCTT CATTAATAAC GTTTTGCAAT TCATCATTGC 
1201 TGCCTGTCTC GTGGAGGTGA TCTATTAGCT TCACAAGCAC AAAAAAAGTC 
1251 AGCGTCTTCA TTATTTATAT TTTACAAAAA GCCAAATTAT TTCAGCATAT 
1301 TCCGGTGATA ACTTTAAAAA T TAG AT AC AT TTTCTTAACA TTTTTTTCTT 
1351 TTTTAATGTT ATGATAATGT ACTTCAAAAT GATGGAAATC TCAACAGTAT 
1401 GAGTATGGCT TGGTTAACGA GCAGTATGTT CACAGCCTGC TTTATCTCTC 
1451 CTTGCTCTTC TCACCTCTCC CTTACCCCGT TCCCTATTTC CGTGTTCTTA 
1501 CCTAGCCTCC CCCCACTTCC TCAAAACAAA CAAGAGATGG CAAAGCAGCA 
1551 GTCCGACCAA GCCCACTGGA ATTATCCTTT AATTTTACAG ATACCACTTG 
1601 CTGTAGGCTG TGGACCAAGA TGTCCAGAAT TATTCTTGAG CACTGATGTA 
1651 AATTACTTAG ATCTTCTTTG AGGTCAGAAT TCAGCGATCA CGGTAGGCAG 
1701 TGCTTGAATG AGAAAAGCCT CCTGGTGCAT CTTCAAAATG AGTCCTAAAG 
1751 AACATACTGA GTACTTATAA GTAGCAGAAC AT AAA AT GT A TTTCTGACTA 
1801 ACACAAATGG TCCTTTCACA TGTGCTTTAT TAGACTCTGG GAGAGAAAAG 
1851 TAACCAAGTG CTTCAGAACA GGTTTTTAGT ATTTACTTCT TCATGGTAAG 
1901 ATAATGAAGT TCTAATGAAC TATTTCTCCC AAGGTTTTAA AATTGTCAAG 
1951 AGTTATTCTG TTTGTTTAAA AAGTAAGAAA CCTCTGTAAG CAATAGATTT 
2001 TGCTTGGGTT TTCTTTCTTA AAAAAATAAT ACTATGCAGG CAAGACACCA 
2051 TAAAAGTTTA ATTCCTTACA GAAGAACCAG TGGAAGAATT TAAATTTGGC 
2101 ACTACGATCA AAACTACTGA ATTAGCAGAA ATAACGATAT CTAAAGCTTA 
2151 CCAGCAAAAG AACCCTCAGC AGAATAGCAA AAACTTTGCT CAGGACATTT 
2201 GAGGTCAAAT TGAAGACGGA AGACGGAAAC CGGAAACCGT TTTCTTGTAA 
2251 GCCCCTAGAG GCAGATCAGG TAAGCATACA TAGTAGAGGG AAAGGAGAGA 
2301 ATGGAAATAA AACTGAATAT TATGCAGATT TATGCCTTAT TTTTTAGCAT 
2351 TTTTTAAGGT TGGGTCTTTC AGGCTGGTTT TGGTTTGTAT TAGATCTGTA 
2401 TAGTTTAGTG ATTTAGTTTT ATATTTAAGC TACGATTAAT ATTTTTTCTT 
2451 TGGCGATATT TCTTTGCTTT TTTTTTTTAA CAACTTTCCA TTTTTAGATG 
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2501 TTTCGTTGAA TCTATTTAGA GCTTCACCAT GGCAATATGT ATTTCCCTTA 
2551 AAACACTGCA AACAAATATA CTAGGAGTGT GCCCTTTTAA TCTTTACTAG 
2601 TTATTGTGAG ACTGCTGTGT AAGCTAATAA ACACATTTGT AAAAACATTG 
2651 TTTGCAGGAA GAAAACTTCG AGTTACAGGT CAGGAAAAGC CTGCTGAATT 
2701 TATGTTGTAA ACGTTACTTA ACACAGTATA AAGATGAAAA GACAACAAAA 
2751 GTATCTTCAT ACTTCCTCAT CCCCTCATTG CAACAAAACC TTAAACTGGG 
2801 AGAACCTTAG TCCCCTCTCT TTCCTCTTCC TCCTCCACTT CCCACTTATT 
2851 GCCACTTTGT AATATTCAGA GAGCACTTGG ATTATGGATC TGAATAGAGA 
2901 AATGCTTACA GATAATCATT AGCCCACATA CCAGTAACTT ATACTTAAAG 
2951 ATGGGATGGA GTTATAAAGT GCTTTTATAA TCCAATATAA TTGCTAAAGG 
3001 CAAGGGTTGA CTCTTTGTTT TATTTTGACA TGGCATGTCC TGAAATAAAT 
3051 ATTGGTTCAC TATGAAAAAA AAAAAAAAAA AAAA 



BLAST Results 



No BLAST result 



Medline entries 



98382468: 
Rab proteins . 

97203146: 

GTP-bound forms of rab6 induce the redistribution of Golgi 
proteins into the endoplasmic reticulum. 



Peptide information for frame 3 



ORF from 456 bp to 1217 bp; peptide length: 254 
Category: strong similarity to known protein 
Classification: unset 

Prosite motifs: B AC T E RI AL_0 P S I N_RET (45-57) 



1 MSAGGDFGNP LRKFKLVFLG EQSVAKTSLI TRFRYDSFDN 
51 LSKTMYLEDG TIGLRLWDTA GQERLRSLIP RYIRDSAAAV 
101 FQQTTKWIDD VRTERGSDVI ITLVGNRTDL ADKRQVSVEE 
151 TFIETRAKTG YNVKQLFRRV AAALPGMEST QDGSREDMSD 
201 VSEGGCSCYS PMSSSTLPQK PPYSFIDCSV NIGLNLFPSL 
251 VSWR 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4kl4, frame 3 

PIR:G34323 GTP-binding protein Rab6 - human, N - 1, Score - 944, P = 
6.5e-95 

TREMBL : CET2 5G12_2 gene: "T25G12.4"; Caenorhabditis elegans cosmid 
T25G12., N = 1, Score = 756, P - 5.4e-75 

TREMBL : NTNTRAF_1 gene: "Nt-rab6"; Nicotiana tabacum SRI Nt-rab6 mRNA, 
complete cds . , N = 1, Score = 698, P - 7.6e-69 

TREMBL: D84 3 14_1 product: "rab6"; Drosophila melanogaster mRNA for 
rab6, complete cds., N = 1, Score «* 836, P = 1.9e-83 

PIR:T01588 small GTP-binding protein F16B22.10 - Arabidopsis thaliana, 
N « 1, Score = 704, P » 1.8e-69 



>PIR:G34323 GTP-binding protein Rab6 - human 
Length - 208 

HSPs: 

Score « 944 (141.6 bits), Expect - 6.5e-95, P = 6.5e-95 
Identities - 186/208 (89%), Positives - 190/208 (91%) 
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Query: 1 MSAGGDFGNPLRKFKLVFLGEQSVAKTSLITRFRYDSFDNTYQAIIGIDFLSKTMYLEDG 60 

MS GGDFGNPLRKFKLVFLGEQSV KTSLITRF YDSFDNTYQA IGIDFLSKTMYLED 
SbjCt: 1 MSTGGDFGNPLRKFKLVFLGEQSVGKTSLITRFMYDSFDNTYQATIGIDFLSKTMYLEDR 60 

Query: 61 TIGLRLWDTAGQERLRSLIPRYIRDSAAAVVVYDITNVNSFQQTTKWIDDVRTERGSDVI 120 

T+ L+LWDTAGQER RSLIP YIRDS AVVVYDITNVNSFQQTTKWIDDVRTERGSDVI 
Sbjct: 61 TVRLQLWDTAGQERFRSLIPSYIRDSTVAWVYDITNVNSFQQTTKWIDDVRTERGSDVI 120 

Query: 121 ITLVGNRTDLADKRQVSVEEGERKAKGLNVTFIETRAKTGYNVKQLFRRVAAALPGMEST 180 

I LVGN+TDLADKRQVS+EEGERKAK LNV FIET AK GYNVKQLFRRVAAALPGMEST 
SbjCt: 121 IMLVGNKTDLADKRQVSIEEGERKAKELNVMFIETSAKAGYNVKQLFRRVAAALPGMEST 180 

Query: 181 QDGSREDMSDIKLEKPQEQTVSEGGCSC 208 

QD SREDM DIKLEKPQEQ VSEGGCSC 
SbjCt: 181 Q DRS REDM I D I KLEK PQEQP VS EGGC SC 208 

Pedant information for DKFZphf kd2__4kl4 , frame 3 

Report for DKFZphfkd2_4kl4 . 3 



[LENGTH] 
[MW] 
tpU 
tHOMOL] 
[FUNCAT] 
7e-60 
( FUNCAT] 
[FUNCAT] 
YOR089c] 
( FUNCAT) 
[FUNCAT] 
[FUNCAT] 
2e-33 
[FUNCAT) 
YGL210W] 
[FUNCAT] 
[FUNCAT] 
8e-27 
[FUNCAT] 
2e-21 
[FUNCAT] 
[FUNCAT] 
2e-21 
[FUNCAT] 
[FUNCAT] 
cerevisiae 
[ FUNCAT J 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT) 
( FUNCAT ) 

[S. 

[FUNCAT) 

[FUNCAT] 

[BLOCKS] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW]' 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 



254 

28385.29 
7.58 

PIR:G34323 GTP-binding protein Rab6 - human le-102 
08.07 vesicular transport (golgi network, etc.) 



[S. cerevisiae, YLR262C] 



2e-33 



30.08 organization of golgi [S. cerevisiae, YLR262c] 7e-60 

30.09 organization of intracellular transport vesicles [S. 



cerevisiae, 



08.19 cellular import [S. cerevisiae, YOR089c] 2e-33 

08.13 vacuolar transport [S. cerevisiae, YOR089c] 2e-33 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YOR089c) 



3e-28 



09.09 biogenesis of intracellular transport vesicles 



(S. cerevisiae, 



30.02 organization of plasma membrane [S. cerevisiae, YFLOOSw] 8e-27 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YFLOOSw] 



01.05.04 regulation of carbohydrate utilization 



[S. cerevisiae, YORlOlw] 



11.10 cell death 



[S. cerevisiae, YORlOlw] 2e-21 



01.03.13 regulation of nucleotide metabolism 



[S. cerevisiae, YORlOlw] 



30.03 organization of cytoplasm [S. cerevisiae, YORlOlw] 2e-21 
03.99 other cell growth, cell division and dna synthesis activities 

YORlOlw] 2e-21 

10.04.07 g-proteins [S. cerevisiae, YORlOlw] 2e-21 

03.22 cell cycle control and mitosis [S. cerevisiae, YNL098c] 6e-19 

11.01 stress response [S. cerevisiae, YNL098c] 6e-19 

03.10 sporulation and germination [S. cerevisiae, YNL098c] 6e-19 

04.07 rna transport [S. cerevisiae, YORl85c] 6e-16 

30.10 nuclear organization [S. cerevisiae, YOR185c) 6e-16 

08.01 nuclear transport [S. cerevisiae, YOR185C] 6e-16 

30.04 organization of cytoskeleton [S. cerevisiae, YPR165w] 4e-13 
10.02.07 g-proteins [S. cerevisiae, YPRl65w] 4e-13 

10.99 other signal-transduction activities [S. cerevisiae, YCR027c] 2e-09 
10.05.07 g-proteins [S. cerevisiae, YLR229c] 8e-08 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YLR229c] 8e-08 

03.01 cell growth [S. cerevisiae, YNL180cJ le-05 

06.10 assembly of protein complexes [S. cerevisiae, YOR094w] Se-05 

BL01115A GTP-binding nuclear protein ran proteins 

dlas3_2 3.29.1.4.12 Transducin (alpha subunit), insertion domai le-32 

dlmhl 3.29.1.4.2 Racl [Human (Homo sapiens) 2e-51 

d5p21 3.29.1.4.1 cH-p21 Ras protein (human (Homo sapiens) 7e-53 

dlhura_ 3.29.1.4.8 ADP-ribosylation factor 1 (ARF1) [human (Horn le-46 

dla2kc~ 3.29.1.4.5 Ran Nuclear transport factor-2 (NTF2) (Do 6e-60 

nucleus 2e-14 

cell cycle control 5e-15 

membrane trafficking 3e-71 

endoplasmic reticulum le-29 

phosphoprotein le-29 

prenylated cysteine 2e-36 

signal transduction 5e-15 

transforming protein 5e-30 

purine nucleotide binding le-28 

alternative splicing le-18 

P-loop 3e-71 



[S. 
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(PIRKW] 


lipoprotein 2e""36 


[PIRKW] 


proto-oncogene le-20 


[PIRKW] 


methylated carboxyl end le-20 


[PIRKW] 


membrane protein le-29 


[PIRKW] 


GTP binding 3e-71 


[PIRKW] 


thiolester bond le-29 


[PIRKW] 


Golgx apparatus le-29 


[SOPFAM] 


ras transforming protein le-76 


[PROSITE] 


BACTERIAL OPSIN RET 1 


[PFAM] 


Ras family (contains ATP /GTP binding P-loop) 


[KW] 


Alpha Beta 


[KW] 


3D 



SEQ MSAGGDFGNPLRKFKLVFLGEQSVAKTSLITRFRYDSFDNTYQAIIGIDFLSKTMYLEDG 

lkao- CCEEEEEEECTTTTCHHHHHHHHHHCCCCCCCTTTTC-EEEEEEEEETTE 

SEQ TIGLRLWDTAGQERLRSLIPRYIRDSAAAVVVYDITNVNSFQQTTKWIDDVRTERGSDVI 

lkao- EEEEEEEECCTTTTCHHHHHHHHHHCCEEEEEEETTTHHHHHHHHHHHHHHHHHTTTCCC 

SEQ ITLVGNRTDLADKRQVSVEEGERKAKGLNVTFIETRAKTGYNVKQLFRRVAAALPGMEST 

1 kao- EEEEEETTTTGGGCCCCHHHHHHHHHHHCCCEEECTTTTHHHHHHHHHHH 

SEQ QDGSREDMSDIKLEKPQEQTVSEGGCSCYSPMSSSTLPQKPPYSFIDCSVNIGLNLFPSL 

lkao- 

SEQ ITFCNSSLLPVSWR 

lkao- 



Prosite for DKFZphf kd2_4kl4 . 3 
PS00327 45->57 BACTERIAL OPSIN RET PDOC00291 



Pfam for DKFZphf kd2_4kl4 . 3 



HMM_NAME 


Ras 


family (contains ATP/GTP binding P-loop) 


HMM 
Query 


15 


* KLVLIGDSGVGKSCLLI RFTQNe FnEe YI PTI GvDFYt KTI EI DGKt I K 
KLV++G+ +V K++L RF +++F++ Y + IG+DF++KT+++++ TI 
KL V FLG EQS V AKT S L I T RFRY DSFDNTYQAIIGIDFLS KTM YL E DGT I G 


HMM 
Query 


64 


LQI WDTAGQERYRsMRPMYYRGAMGFMLVYDITNRqSFENI rNWweEI r R 
L +WDTAGQER RS+ P Y+R++ ++++VYDITN SF+ ++W++++R+ 
LRLWDTAGQERLRSLIPRYIRDSAAAVVVYDITNVNSFQQTTKWIDDVRT 


HMM 
Query 


114 


HCDrDENVPIMLVGNKCDLEDQRQVStEEGQeFAREWGAIPFMETSAKTN 
+ ++V+I LVGN +DL+D+RQVS EEG+ A+ ++ + F+ET AKT+ 
ERG--SDVIITLVGNRTDLADKRQVSVEEGERKAKGLN-VTFIETRAKTG 


HMM 
Query 


161 


iNVEEAFMEIvRellqrMqe.q.NqteNinidQpsrnrk rCCCIM* 

+NV++ F +++ +++ +++ + +++++++I+ ++++ + +C+ + 
YNVKQLFRRVAAALPGMESTQDGSREDMSDIKLEKPQEQTVSEGGCS-C 
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DKFZphfkd2_4mll 



group: transmembrane protein 

DKF2phfbr2-4mll encodes a novel 159 amino acid protein with weak similarity to the putative 
membrane protein YMR034c of S. cerevisiae. 

The novel protein contains 4 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of kidney-specific 
genes and as a new marker of neuronal cells. 

weak similarity to YMR034c 

complete cDNA, complete cds, no EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 1749 bp 

Poly A stretch at pos. 1727, polyadenylation signal at pos. 1713 

1 GGGGTCCTCA AAGCCGCCGG AGCAACCCCC AGGTCTTTAC TTTACAATCG 
51 GCAATTTGAC TTGCTCTGCT GCATGTCTGG AGGGACCAAG GAAAGTGTGG 

101 AGACGCTCCA AGGATTAGGT GATCGGAGCT TGAAAAGAAA AAAAGCCAAA 

151 CAAATAAACA AAACCCACCC ACCCTAACGA ATATGAGGCT GCTGGAGAGA 

201 ATGAGGAAAG ACTGGTTCAT GGTCGGAATA GTGCTGGCGA TCGCTGGAGC 

251 TAAACTGGAG CCGTCCATAG GGGTGAATGG GGGACCACTG AAGCCAGAAA 

301 TAACTGTATC CTACATTGCT GTTGCAACAA TATTCTTTAA CAGTGGACTA 

351 TCATTGAAAA CAGAGGAGCT GACCAGTGCT TTGGTGCATC TAAAACTGCA 

401 TCTTTTTATT CAGATCTTTA CTCTTGCATT CTTCCCAGCA ACAATATGGC 

451 TTTTTCTTCA GCTTTTATCA ATCACACCCA TCAACGAATG GCTTTTAAAA 

501 GGTTTGCAGA CAGTAGGTTG CATGCCTCCG CCTGTGTCTT CTGCAGTGAT 

551 TTTAACCAAG GCAGTTGGTG GAAATGAGGC AGCTGCAATA TTTAATTCAG 

601 CCTTTGGAAG TTTTTTGGTA AGTAAACATA GTTTAACTTG TCTATTACAA 

651 CTTTTGCTGT GATATTGTGT ATATGAAAGA TTTAGTGAAA GCTGGATTTG 

701 TTTTACTCTT TGGTTAAGTA TAAAAATTGT TGAATCTTTT CATGTGCCAG 

751 TATCCATACC CTGAAGAAAA GTAGTTAATG AATAAAGCAA ATGTTCTCTT 

801 ACAATATATT TTGGAGGTTT GGATTTTAAA ATTCCATTTA ATGAATTCAA 

851 GGAATCAATT AAAACACTAT GTGTCTCCTT ATAGAGGTTA TGTCAATATA 

901 TTGATCATTT AATGAGGTCT TTTAGATTAT TATTATTTTG TATCATGGGA 

951 CTGAGGATTT TGAAAAGGAA ACATGACCCA GCTGGTCAGA AAGGGAATGC 
1001 TAATTTACTT GTTGACATGC CATTTATTTT GTACATTTCA CTGTCAAAGA 
1051 AGCTACTGGC TTGGATGCTT CTGAGAAATC TATGTGAGAA AAAATTTGAA 
1101 AGGAAGATAT GACTAATGAG TAATTTGCAA GTAAATGTTG TATCTATATA 
1151 TATATATATA TAAAGATTCA AAAGTAGTTC AGCTTTCATA AGTAGAACCA 
1201 ATATAAGGAC GTTGTTTTAG CATTTTTAAT CATTATTTTT AAATAAATGA 
1251 TGTAACAGAG GCTTGATTTG TGTTATGAAA GATTGAGAAA CTAAATTTTC 
1301 TGTTGATTTA ATTTTTTTGT GCCTTAAAAC TTTGTTAAAT TCCTGAAGTT 
1351 AATTATCATA TTGTACTTTT TGGGGCATAA CTCATTAGCA GATATGTAGT 
1401 GCAGTGATTT ACAAATAATT GAGAGTAAAA TCAGTGATGT ATAAACTAGT 
14 51 TCATGAGTCT AGGTAAAATA TCAATTACCT CTGTTTAAAA TGCTCTGTTA 
1501 ATTATTATTG TATGTATTTA AATGTAGTTA AAGCTTTTAA ACATGTTGTT 
1551 ACATAGTGTT AATTCTACAC AGTGCTACAC AGCTTTTAGT GTCACATAGC 
1601 CTTACAGAGT TTATAATGAT GTAGCATCTG CAAAATATAT GCATAGCTTA 
1651 TATCCTATTT TTATAGAGCC AGTAATGGTT TTTGTGATGC TGTATTACTT 
1701 CTGGGTTTTA GACAATAAAG TCTGTTTAAC AAAAAAAAAA AAAAAAAAA 

BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 



Peptide information for frame 3 
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ORF from 183 bp to 659 bp; peptide length: 159 
Category: similarity to unknown protein 



1 MRLLERMRKD WFMVGIVLAI AGAKLEPSIG VNGGPLKPEI TVSYIAVATI 
51 FFNSGLSLKT EELTSALVHL KLHLFIQIFT LAFFPATIWL FLQLLSITPI 
101 NEWLLKGLQT VGCMPPPVSS AVILTKAVGG NEAAAIFNSA FGSFLVSKHS 
151 LTCLLQLLL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4mll, frame 3 

PIR:S53951 probable membrane protein YMR034c - yeast (Saccharomyces 
cerevisiae), N ■ 1, Score = 171, P - 3.2e-12 

PIR:A65015 yfeH protein - Escherichia coli (strain K-12), N » 1, Score 
= 131, P = 4.2e-08 



>PIR:S53951 probable membrane protein YMR034C - yeast (Saccharomyces 
cerevisiae) 

Length - 434 

HSPs: 

Score = 171 (25.7 bits), Expect * 3.2e-12, P *» 3.2e-12 
Identities = 38/144 (26%), Positives = 72/144 (50%) 

Query: 5 ERMRKDWFMVGIVLAIAGAKLEPSIGVNGGPLKPEITVSYIAVATIFFNSGLSLKTEELT 64 

E ++ WF + + + I A+ P+ +GG +K + ++ Y VA IF SGL +K+ L 
Sbjct: 18 EFLKSQWFFICLAILIVIARFAPNFARDGGLIKGQYSIGYGCVAWIFLQSGLGMKSRSLM 77 

Query: 65 SALVHLKLHLFIQIFTLAFFPATIWLF LQLLSITPINEWLLKGLQTVGCMPPPVSSA 121 

+ +++ +HI++ +++F +++ I++W+L GL P V+S 

Sbjct: 78 ANMLNWRAHATILVLSFLITSSIVYGFCCAVKAANDPKIDDWVLIGLILTATCPTTVASN 137 

Query: 122 VILTKAVGGNEAAAIFNSAFGSFL 145 

VI +T GGN + G+ L 

Sbjct: 138 V I MT TN AGGN SLLCVCEVFIGNLL 161 



Pedant information for DKFZphf kd2_4ml 1 , frame 3 



Report for DKFZphf kd2_4mll . 3 



[LENGTH] 159 

[MWJ 17282.92 

(pi] 9.06 

[HOMOL] PIR:S53951 probable membrane protein YMR034c - yeast (Saccharomyces cerevisiae) 
5e-12 

[FUNCAT] 99 unclassified proteins (S. cerevisiae, YMR034c] 2e-13 

[PROSITE] MYRISTYL 2 

I PROS I TE] PKC_PHOSPHO_SITE 1 

[KW] TRANSMEMBRANE 4 



SEQ MRLLERMRKDWFMVGIVLAIAGAKLEPSIGVNGGPLKPEITVSYIAVATI FFNSGLSLKT 

PRD ccchhhhhhhhhhhhhhhhhhhhhcccccccccccccceeeeeeeccccccccccchhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMM . . 

SEQ EELTSALVHLKLHLFIQI FTLAFFPATIWLFLQLLSITPINEWLLKGLQTVGCMPPPVSS 

PRD hhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhccccchhhhhhhheeeecccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ AV I LTKAVGGNEAAA I FNSAFGSFLVSKHS LTCLLQLLL 

PRD ceeeeeccccchhhhhhhcccccceeecceeeeeeeccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphf kd2_4mll . 3 

PS00005 57->60 PKC PHOSPHO_SITE PDOC00005 

PS00008 15->21 MYRISTYL PDOC00008 

PS00008 129->135 MYRISTYL PDOC00008 

(No Pfam data available for DKFZphf kd2 4mll.3) 
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PAGE INTENTIONALLY LEFT BLANK 
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DKFZphutel_17k7 



group: uterus derived 

DKFZphutel_17k7 encodes a novel 520 amino acid protein with weak similarity to S. Cerevisiae 
Fipl. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes. 



similarity to S. cerevisiae Fipl 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 1914 bp 

Poly A stretch at pos. 1897, polyadenylation signal at pos. 1867 



1 CGGACGCGTG GGCGGACGCG TGGGGCCTTC CTGGGATTGG AGTCTCGAGC 
51 TTTCTTCGTT CGTTCGCCGG CGGGTTCGCG CCCTTCTCGC GCCTCGGGGC 
101 TGCGAGGCTG GGGAAGGGGT TGGAGGGGGC TGTTGATCGC CGCGTTTAAG 
151 TTGCGCTCGG GGCGGCCATG TCGGCCGGCG AGGTCGAGCG CCTAGTGTCG 
201 GAGCTGAGCG GCGGGACCGG AGGGGATGAG GAGGAAGAGT GGCTCTATGG 
251 CGATGAAAAT GAAGTTGAAA GGCCAGAAGA AGAAAATGCC AGTGCTAATC 
301 CTCCATCTGG AATTGAAGAT GAAACTGCTG AAAATGGTGT ACCAAAACCG 
351 AAAGTGACTG AGACCGAAGA TGATAGTGAT AGTGACAGCG ATGATGATGA 
401 AGATGATGTT CATGTCACTA TAGGAGACAT TAAAACGGGA GCACCACAGT 
451 ATGGGAGTTA TGGTACAGCA CCTGTAAATC TTAACATCAA GACAGGGGGA 
501 AGAGTTTATG GAACTACAGG GACAAAAGTC AAAGGAGTAG ACCTTGATGC 
551 ACCTGGAAGC ATTAATGGAG TTCCACTCTT AGAGGTAGAT TTGGATTCTT 
601 TTGAAGATAA ACCATGGCGT AAACCTGGTG CTGATCTTTC TGATTATTTT 
651 AATTATGGGT TTAATGAAGA TACCTGGAAA GCTTACTGTG AAAAACAAAA 
701 GAGGATACGA ATGGGACTTG AAGTTATACC AGTAACCTCT ACTACAAATA 
751 AAATTACGGT ACAGCAGGGA AGAACTGGAA ACTCAGAGAA AGAAACTGCC 
801 CTTCCATCTA CAAAAGCTGA GTTTACTTCT CCTCCTTCTT TGTTCAAGAC 
851 TGGGCTTCCA CCGAGCAGGA GATTACCTGG GGCAATTGAT GTTATCGGTC 
901 AGACTATAAC TATCAGCCGA GTAGAAGGCA GGCGACGGGC AAATGAGAAC 
951 AGCAACATAC AGGTCCTTTC TGAAAGATCT GCTACTGAAG TAGACAACAA 
1001 TTTTAGCAAA CCACCTCCGT TTTTCCCTCC AGGAGCTCCT CCCACTCACC 
1051 TTCCACCTCC TCCATTTCTT CCACCTCCTC CGACTGTCAG CACTGCTCCA 
1101 CCTCTGATTC CACCACCGGG TTTTCCTCCT CCACCAGGCG CTCCACCTCC 
1151 ATCTCTTATA CCAACAATAG AAAGTGGACA TTCCTCTGGT TATGATAGTC 
1201 GTTCTGCACG TGCATTTCCA TATGGCAATG TTGCCTTTCC CCATCTTCCT 
1251 GGTTCTGCTC CTTCGTGGCC TAGTCTTGTG GACACCAGCA AGCAGTGGGA 
1301 CTATTATGCC AGAAGAGAGA AAGACCGAGA TAGAGAGAGA GACAGAGACA 
1351 GAGAGCGAGA CCGTGATCGG GACAGAGAAA GAGAACGCAC CAGAGAGAGA 
1401 GAGAGGGAGC GTGATCACAG TCCTACACCA AGTGTTTTCA ACAGCGATGA 
1451 AGAACGATAC AGATACAGGG AATATGCAGA AAGAGGTTAT GAGCGTCACA 
1501 GAGCAAGTCG AGAAAAAGAA GAACGACATA GAGAAAGACG ACACAGGGAG 
1551 AAAGAGGAAA CCAGACATAA GTCTTCTCGA AGTAATAGTA GACGTCGCCA 
1601 TGAAAGTGAA GAAGGAGATA GTCACAGGAG ACACAAACAC AAAAAATCTA 
1651 AAAGAAGCAA AGAAGGAAAA GAAGCGGGCA GTGAGCCTGC CCCTGAACAG 
1701 GAGAGCACCG AAGCTACACC TGCAGAATAG GCATGGTTTT GGCCTTTTGT 
1751 GTATATTAGT ACCAGAAGTA GATACTATAA ATCTTGTTAT TTTTCTGGAT 
1801 AATGTTTAAG AAATTTACCT TAAATCTTGT TCTGTTTGTT AGTATGAAAA 
1851 GTTAACTTTT TTTCCAAAAT AAAAGAGTGA ATTTTTCATG TTAAGTTAAA 
1901 AAAAAAAAAA AAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
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Peptide information for frame 3 



ORF from 168 bp to 1727 bp; peptide length: 520 
Category: similarity to known protein 



1 MSAGEVERLV 

51 DETAENGVPK 

101 APVNLNIKTG 

151 RKPGADLSDY 

201 GRTGNSEKET 

251 RVEGRRRANE 

301 LPPPPTVSTA 

351 PYGNVAFPHL 

401 RDRERERTRE 

451 EERHRERRHR 

501 KEAGSEPAPE 



SELSGGTGGD 
PKVTETEDDS 
GRVYGTTGTK 
FNYGFNEDTW 
ALPSTKAEFT 
NSNIQVLSER 
PPLIPPPGFP 
PGSAPSWPSL 
RERERDHSPT 
EKEETRHKSS 
QESTEATPAE 



EEEEWLYGDE 
DSDSDDDEDD 
VKGVDLDAPG 
KAYCEKQKRI 
SPPSLFKTGL 
SATEVDNNFS 
PPPGAPPPSL 
VDTSKQWDYY 
PSVFNSDEER 
RSNSRRRHES 



NEVERPEEEN 
VHVTIGDIKT 
SINGVPLLEV 
RMGLEVIPVT 
PPSRRLPGAI 
KPPPFFPPGA 
IPTIESGHSS 
ARREKDRDRE 
YRYREYAERG 
EEGDSHRRHK 



ASANPPSGIE 
GAPQYGSYGT 
DLDSFEDKPW 
STTNKITVQQ 
DVIGQTITIS 
PPTHLPPPPF 
GYDSRSARAF 
RDRDRERDRD 
YERHRASREK 
HKKSKRSKEG 



BLASTP hits 



Entry AF016427_4 from database TREMBL: 

gene: "F32D1.9"; Caenorhabditis elegans cosmid F32D1. 

Score « 392, P « 1.8e-36, identities = 156/519, positives - 212/519 

Entry S62454 from database PIR: 

hypothetical protein SPAC22G7.10 - fission yeast (Schizosaccharomyces 
pombe) 

Score = 246, P - 2.0e-22, identities « 62/163, positives « 91/163 

Entry A56545 from database PIR: 

FIP1 protein - yeast (Saccharomyces cerevisiae) 

Score = 186, P « 2.9e-16, identities » 56/206, positives - 92/206 



Alert BLASTP hits for DKFZphutel_17k7, frame 3 

TREMBLNEW:AF109907_1 product: "S164"; Homo sapiens S164 gene, partial 
cds; PS1 and hypothetical protein genes, complete cds; and S171 gene, 
partial cds., N = 2, Score - 236, P - 1.5e-16 



>TREMBLNEW:AF109907_1 product: "S164"; Homo sapiens SI 64 gene, partial cds; 
PS1 and hypothetical protein genes, complete cds; and S171 gene, partial 
cds. 

Length = 735 

HSPs: 

Score = 236 (35.4 bits), Expect = 1.5e-16, Sum P(2) = 1.5e-16 
Identities = 51/120 (42%), Positives = 76/120 (63%) 



Query: 


383 


Sbjct: 


227 


Query: 


440 


Sbjct: 


286 


Query: 


499 


Sbjct: 


346 


Score 


- 214 


Identities • 


Query: 


383 


Sbjct: 


208 


Query: 


441 


Sbjct: 


267 


Query: 


501 


Sbjct: 


325 



REK+++RER+R+R+RDRDR +ER+R R+RER+RD 



S + +++R R RE + 



ER 



ER R 



+ ER RER R RE+E R + 



+ +R E +E D++ R K ++ R K 



(32.1 bits), Expect - 4.4e-14, Sum P(2) 
50/133 (37%), Positives = 75/133 (56%) 



RE++R+R ER+R+RER+R+R++E+ER RERER+RD 



4.4e-14 



+R++ 



E+ R+R RE+E R + 



R R 



D ER R R+ ER 



R + ++ K 



+E 



E A E+ 



440 



WO 01/12659 



PCT/IB00/01496 



Score - 214 
Identities 1 



(32.1 bits), Expect = 4.4e-14, Sum P(2) 
■ 55/141 (39%), Positives - 80/141 (56%) 



4.4e-14 



Query: 


383 


Sbjct: 


208 


Query: 


441 


Sbjct: 


267 


Query: 


498 


Sbjct: 


327 


Score 


- 210 



REKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPSVFNS-DEERYRYREYAERG 440 
RE++R+R ER+R+RER+R+R++E+ER RERER+RD T D ER R R+ ER 

RERERERREREREREREREREKEKERERERERDRDRDRTKERDRDRDRERDRDRD-RERS 266 

YERHR-ASREKEE-RHRER-RHREKEETRHKSSRSNSRRRHESEEGDSHRRHKHKKSKRS 497 

+R++ SR +E+ R RER R RE+E R+ REE RKKKR 

SDRNKDRSRSREKSRDREREREREREREREREREREREREREREREREREREKDKKRDRE 32 6 

KEGKEAGSEPAPEQESTEATPA 519 
++ ++A E++ E A 

EDEEDAYERRKLERKLREKEAA 348 



Identities 



(31.5 bits), Expect = 1.2e-13, Sum P(2) 
- 59/142 (41%), Positives = 78/142 (54%) 



1.2e-13 



Query: 


383 


Sbjct: 


235 


Query: 


440 


Sbjct: 


294 


Query: 


490 


Sbjct: 


354 


Score 


- 205 


Identities s 


Query: 


372 


Sbjct: 


228 


Query: 


430 


Sbjct: 


285 


Query: 


480 


Sbjct: 


344 


Score 


- 202 


Identities ■ 


Query: 


383 


Sbjct: 


277 


Query: 


443 


Sbjct: 


335 


Score 


= 183 


Identities « 


Query: 


372 


Sbjct: 


178 


Query: 


430 


Sbjct: 


231 


Query: 


487 


Sbjct: 


289 


Score 


= 171 


Identities • 


Query: 


383 


Sbjct: 


285 


Query: 


443 


Sbjct: 


345 



REKDRDRERDRDRERDRDRDRERERTRERERERDHSPTPSVFNS DEERYRYREYAER 439 

RE++RDR+RDR +ERDRDRDRER+R R+RER D + S D ER R RE ER 

RERERDRDRDRTKERDRDRDRERDRDRDRERSSDRNKDRSRSREKSRDRERERERE-RER 293 

GYERHRA-SREKE-ERHRER-RHREKEETRHKSS RSNSRRRHESEEGDSHRRH 489 

ER R RE+E ER RER R REK++ R + R R+ +E R 

EREREREREREREREREREREREREKDKKRDREEDEEDAYERRKLERKLREKEAAYQERL 353 

KH KKS K RS K EG KE AGS E P A P EQE 512 
K+ + + K+ +E E E+E 
KNWEIRERKKTREYEKEAEREEE 376 

(30.8 bits), Expect = 4.4e-13, Sura P(2) = 4.4e-13 
- 59/149 (39%), Positives = 83/149 (55%) 

DTSKQWDYYARREKDRDR — ERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEE 429 
+ K+ + R++DRDR E RDRDR+ R+ RDRD R+ RE R+ +R ++R S S D E 
EKEKERERERERDRDRDRTKERDRDRDRERDRDRDRERSSDRNKDRSRSREKS RDRE 284 

RYRYREYAERGYERHRA-SREKE-ERHRER-RHREKEETRHKSS RSNSRRRHE 479 

R R RE ER ER R RE+E ER RER R REK++ R + R R+ 

RERERE-REREREREREREREREREREREREREREKDKKRDREEDEEDAYERRKLERKLR 343 

SEEGDSHRRHKHKKSKRSKEGKEAGSEPAPEQE 512 

+E R K+ + + K+ +E E E+E 

EKEAAYQERLKNWEIRERKKTREYEKEAEREEE 376 

(30.3 bits), Expect - 9.6e-13, Sum P(2) - 9.6e-13 
> 49/117 (41%), Positives = 70/117 (59%) 

REKDRDRERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEERYRYREYAERGYE 442 
REK RDRER+R+RER+R+R+RERER RERERER+ D++R R E E YE 

REKSRDRERERERERERERERERERERERERERERERERER-EKDKKRDR-EEDEEDAYE 334 

RHRASREKEERHRERRHREKEETRHKSSRSNSRR-RHESEEGDSHRRHKHKKSKRSKE 499 
R + E++ R +E ++E+ + R +R E+E + RR K++KR KE 

RRKL — ERKLREKEAAYQERLKNWEIRERKKTREYEKEAEREEERRREMAKEAKRLKE 390 

(27.5 bits), Expect = 1.2e-10, Sum P(2) = 1.2e-10 

• 52/141 (36%), Positives « 79/141 (56%) 

DTSKQWDYY-ARREKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEE 429 
DT K+ + ++EK+R E++R RER+R+R+RERER RERERER+ ++E 
DTHKKLEEEKGKKEKERQEIEKER-RERERERERERER-RERERERERER EREKE 230 

RYRYREYAERGYERHRASREKEERHRER RHREKEETRHKSSRSNSRRRHESEEGDSH 486 

+ R RE ER +R R +R RER R RE+ R+K RS SR + E + 

KERERE-RERDRDRDRTKERDRDRDRERDRDRDRERSSDRNKD-RSRSREKSRDRERERE 288 

RRHKHKKSKRSKEGKEAGSEPAPEQE 512 
R + ++ + + +E E E+E 
RERERERERERERERERERERERERE 314 

(25.7 bits), Expect = 2.5e-09, Sum P(2) « 2.5e-09 

* 49/150 (32%), Positives = 78/150 (52%) 

REKDRDRERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEERYRYREYAERGYE 442 
RE++R+RER+R+ RER+R+R+RERER RERERER+ +E+ Y R+ + E 

REREREREREREREREREREREREREREREREREKDKKRDREEDEEDAYERRKLERKLRE 344 

RHRASREK EERHRERRHR EKEETRHKSSRSNSRRRHES-EEGDSHRRH-KH 491 

+ A +E+ ER + R + E+EE R + ++R E E+ D R K+ 

KEAAYQERLKNWEIRERKKTREYEKEAEREEERRREMAKEAKRLKEFLEDYDDDRDDPKY 404 



441 



WO 01/12659 
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Query; 4 92 KKSKRSKEGKEAGSEPAPEQESTE 515 

+K R +E + E ++E E 
Sbjct: 405 YRGSALQKRLRDREKBMEADERDRKREKEE 434 

Score = 162 (24.3 bits), Expect ~ 2.4e-08, Sum P(2) = 2.4e-08 
Identities = 45/141 (31%), Positives « 74/141 (52%) 



Query: 


372 


DTSKQWDYYARREKDRDRERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEERY 


431 






+ SK D 4 4 E444 44 +E 444R RERER RERERER + ER 




Sbjct: 


172 


EISKFRDTHKKLEEEKGKKEKERQEIEKER-RERERERERERERRERERER--ERERERE 


228 


Query: 


432 


RYREYAERGYERHRASREKEERHRER-RHREKEETRHKSSRSNSRRRHESEEGDSHRRHK 


490 




+ +E ER ER R 4ER R+R R R+++ R +SS N R E+ R 4 




Sbjct: 


229 


KFKF-RFRFRFRnRnRTIRTKFRDRDRnUERDRDRDRERSSDRNKDRSRSREKSRDRERER 


287 


Query: 


491 


HKKSKRSKEGKEAGSEPAPEQE 512 








++ +R +F +F F F+F 




Sbjct: 


288 


ERE RERE RE - RE RE RE RE RE RE 308 




Score 


- 137 


(20.6 bits), Expect « 1.2e-05, Sum P(2) - 1.2e-05 




Identities 1 


= 48/152 (31%), Positives = 68/152 (44%) 




Query: 


364 


APSWPSLVDTSKQWDYYARREKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPS 


422 






AP P 4 T + + E RD R4 4 RD 4 E E4 + 4E4ER 




Sbjct: 


143 


APLIPYPLITKEDINAIEMEEDKRDLISREISKFRDTHKKLEEEKGK-KEKERQEIEKER 




Query: 


423 


VFNSDEERYRYREYAERGYERHRA-SREKE-ERHRER-RHREKEETRHKS-SRSNSRRRH 


478 






4 ER R RE ER ER R REKE ER RER R R444 T4 4 R R R 




Sbjct: 


202 


R-ERERERERERERREREREREREREREKEKERERERERDRDRDRTKERDRDRDRERDRD 




Query: 


479 


ESEEGDSHRRHKHKKSKRSKEGKEAGSEPAPEQE 512 








E S R 4S4 4E E E4E 




Sbjct: 


261 


RDRERSSDRNKDRSRSREKSRDRERERERERERE 294 




Score 


=■ 126 


(18.9 bits), Expect - 1.8e-04, Sum P(2) = 1.8e-04 




Identities - 41/149 (27%), Positives = 66/149 (44%) 




Query: 


375 


KQWDYYARREKDRDRERDRDRERDRDRDRERERTRERERERDHSPT PSVFNSD--EE 


429 






K W4 R4K R4 E44 4RE 4R R4 +E R 4E D4 P 4 44 




Sbjct: 


354 


KNWEI-RERKKTREYEKEAEREEERRREMAKEAKRLKEFLEDYDDDRDDPKYYRGSALQK 




Query: 


430 


RYRYREYAERGYERHRASREKEERHRERR HREKEETRHKSSRSNSRRRHES — E 


481 






R R RE ER R REKEE R4 H 4 4 4 4 RRR ' 4 




Sbjct: 


413 


RLRDREKEMEADERDR-KREKEELEEIRQRLLAEGHPDPDAELQRMEQEAERRRQPQIKQ 


471 


Query: 


482 


EGDSHRRHKHKKSKRSKEGKEAGSEPAPEQE 512 








E 4S 4 K4 K K 4 E PEQ+ 




Sbjct: 


472 


EPESEEEEEEKQEKEEKREEPMEEEEEPEQK 502 




Score 


= 124 


(18.6 bits), Expect = 3.0e-04, Sum P(2) *» 3.0e-04 




Identities - 41/141 (29%), Positives = 65/141 (46%) 




Query: 


380 


YARREKDRD-RERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEERYRYREYAE 


438 






Y R K4 4 RER 4 RE 444 4 RE ER RE 4E 4 4 D44R 4 Y 




Sbjct: 


349 


YQERLKNWEIRERKKTREYEKEAEREEERRREMAKEAKRLKE-FLEDYDDDRDDPKYYRG 


407 


Query: 


439 


RGYERHRASREKEERHRER-RHREKEETRHKSSRSNSRRRHESEEGDSHRRHKHKKSKRS 


497 






44 REKE ER R REKEE R4 H 44R44 4R 




Sbjct: 


408 


SALQKRLRDREKEMEADERDRKREKEELEEIRQRLLAEG-HPDPDAELQRMEQEAERRRQ 


466 


Query: 


498 


KEGKEAGSEPAPEQESTEATPAE 520 








4 K4 EP E4E E E 




Sbjct: 


467 


PQIKQ EPESEEEEEEKQEKE 486 




Score 


= 121 


(18.2 bits), Expect = 6.2e-04, Sum P(2) = 6.2e-04 





Identities = 43/149 (28%), Positives = 67/149 (44%) 



Query: 


364 


APSWPSLVDTSKQWDYYARREKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPS 


422 






AP P 4 T 4 4 E RD R4 4 RD 4 E E4 4 4E4ER 




Sbjct: 


143 


APLIPYPLITKEDINAIEMEEDKRDLISREISKFRDTHKKLEEEKGK-KEKERQEIEKE- 


200 


Query: 


423 


VFNSDEERYRYREYAERGYERHRASREKEERHRERRHREKEETRHKSSRSNSRRRHESEE 


482 






4 ER R RE R ER R RE4E 4 R RE4E R 4 R4 R R E 




Sbjct: 


201 


— RRERERERERERERRERERER-EREREREKEKERERERERDRDRD-RTKERDRDRDRE 


256 


Query: 


483 


GDSHRRHKHKKSKRSKEGKEAGSEPAPEQE 512 








D R 4 4 S R4K4 4 E 4 44E 




Sbjct: 


257 


RDRDR-DRERSSDRNKD-RSRSREKSRDRE 284 




Score 


= 105 


(15.8 bits), Expect - 3.1e-02, Sum P(2) - 3.1e-02 





442 
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Identities = 25/73 (34%), Positives = 33/73 (45%) 



Query: 


428 


EERYRYREYAERGYERHRASREKE-ERHRERRHREKEETRHKSSRSNSRRRHESEEGDSH 


486 






EE +E + E+ R RE+E ER RERR RE+E R + REE 




Sbjct: 


184 EEEKGKKEKERQEIEKERRERERERERERERREREREREREREREKEKERERERERDRDR 24 3 


Query: 


487 


RRHKHKKSKRSKE 499 








R K + R +E 




Sbjct: 


244 


DRTKERDRDRDRE 256 




Score 


= 105 


(15.8 bits), Expect = 3.1e-02, Sum P(2) = 3.1e-02 




Identities : 


= 31/87 (35%), Positives =» 45/87 (51%) 




Query: 


382 


RREKDRDRERDRDRERDRDRDRER-ERTRERERERDHSPTPSVFNSDEERYRYREYAERG 


440 






+R +DR++E + D ERDR R++E E R+R H P P D E R + AER 




Sbjct: 


412 


KRLRDREKEMEAD-ERDRKREKEELEEIRQRLLAEGH-PDP D AE LQRMEQE AERR 


464 


Query: 


441 


YERHRASREKEERHRERRHREKEETRHK 468 






+ + +E E E +EKEE R + 




Sbjct: 


465 


-RQPQIKQEPESEEEEEEKQEKEEKREE 491 




Score 


- 46 


(6.9 bits), Expect = 1.5e-16, Sum P(2) « 1.5e-l6 





Identities = 13/49 (26%), Positives = 21/49 (42%) 

Query: 54 AENGVPKPKVTETEDDSDSDSDDDEDDVHVTIGDIKTGAPQYGSYGTAP 102 

A NG +P+ +D+ D + D + G 1+ +Y S AP 
Sbjct: 70 ASNGNARPETVTNDDEEALDEETKRRDQMIK-GAIEVLIREYSSELNAP 117 

Score - 46 (6.9 bits), Expect = 1.8e-04, Sum P(2) = 1.8e-04 
Identities = 14/53 (26%), Positives = 21/53 (39%) 

Query: 30 ENEVERPEEENASANPPSGIEDETAENGVPKPKVTETEDDSDSDSDDDEDDVH 82 

+EERE EE E ++EED D ++DE+D + 

Sbjct: 282 DRERERERERERERERERERERER-EREREREREREREKDKKRDREEDEEDAY 333 

Score = 44 (6.6 bits), Expect = 2.0e-13, Sum P(2) = 2.0e-13 
Identities = 13/60 (21%), Positives = 21/60 (35%) 

Query: 20 DEEEEWLYGDENEVERPEEENASANPPSGIEDETAENGVPKPKVTETEDDSDSDSDDDED 79 

++E +++EERE + E K+EEDDD +D 

Sbjct: 191 E K ERQE I EK E RRERERERE RERE RRE RE RE RE RE RERE KE K E RERERERDRDRD RT KER D 250 



Pedant information for DKFZphutel_17k7 , frame 3 



Report for DKFZphutel_17k7 . 3 



[LENGTH] 520 

[MW] 58375.30 

[pi] 5.41 

[HOMOL] PIR:S62454 hypothetical protein SPAC22G7.10 

(Schizosaccharomyces pombe) 3e-18 



fission yeast 



[FUNCAT] 
cerevisiae, 
[FUNCAT] 
[PROSITEJ 
[PROSITE] 
[PROSITEJ 
[PROSITEJ 
[PROSITEJ 
[PROSITE] 
[KW] 
[KW] 



[S. cerevisiae, YJR093c] 2e-13 



04.05.05 mrna processing (5*-end, 3'-end processing and mrna degradation) 
YJR093c] 2e-13 

30.10 nuclear organization 
MYRISTYL 9 
AM I DAT I ON 1 
CK2_PHOSPHO_SITE 18 
TYR_PHOSPHO_SITE 2 
PKC_PHOS PHO_S I TE 12 
ASN_GLYCOSYLATION 2 
Alpha_Beta 

LOW COMPLEXITY 35.00 % 



SEQ MSAGEVERLVSELSGGTGGDEEEEWLYGDENEVERPEEENASANPPSGIEDETAENGVPK 

SEG xxxxxxxxxx 

PRD cccchhhhhhhhcccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PKVTETEDDSDSDSDDDEDDVHVTIGDIKTGAPQYGSYGTAPVNLNIKTGGRVYGTTGTK 

SEG . . .xxxxxxxxxxxxxxxxx 

PRD cceeeecccccccccccccceeeeeccccccccccccccccceeeeeecccceeeccccc 

SEQ VKGVDLDAPGSINGVPLLEVDLDSFEDKPWRKPGADLSDYFNYGFNEDTWKAYCEKQKRI 

SEG , 

PRD ceeeccccccccccceeeeccccccccccccccccccccccccccccchhhhhhhhhhhh 

SEQ RMGLEVIPVTSTTNKITVQQGRTGNSEKETALPSTKAEFTSPPSLFKTGLPPSRRLPGAI 

SEG 



443 
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PRD hhhheeeeeccccceeeeeeecccccccccccccceeeeccccceeeecccccccccccc 

SEQ DVIGQTITI SRVEGRRRANENSNIQVLSERSATEVDNNFSKPPPFFPPGAPPTHLPPPPF 

SEG xxxxxxxxxxxxxxxxxxx 

PRD ccccceeeeeecccccccccccceeecccccccccccccccccccccccccccccccccc 

SEQ LPPPPTVSTAPPLIPPPGFPPPPGAPPPSLIPTIESGHSSGYDSRSARAFPYGNVAFPHL 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccceeeccc 

SEQ PGSAPSWPSLVDTSKQWDYYARREKDRDRERDRDRERDRDRDRERERTRERERERDHSPT 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx . . . . 

PRD ccccccccceeeccccchhhhhhhhhhccccccccccccccchhhhhhhhhhhhcccccc 

SEQ PSVFNSDEERYRYREYAERGYERHRASREKEERHRERRHREKEETRHKSSRSNSRRRHES 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccchhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccc 

SEQ EEGDSHRRHKHKKSKRSKEGKEAGSEPAPEQESTEATPAE 

SEG xx. .xxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccc 



Prosite for DKFZphuteI_17k7 .3 



PS00001 


40->44 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


278->282 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


169->172 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


193->196 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS0OOO5 


206->209 


PKC PHOSPHO' 


'site 


PDOC00005 


PS00005 


214->217 


PKC PHOSPHO" 


"site 


PDOC00005 


PSULJUUb 


233->236 


PKC PHOSPHO" 


'site 


T\T\r\r* ft ft ft ft C 

PDOC00005 




Zbo - >£ / 1 


PKC PHOSPHO" 


"site 


PDOCUUUUD 


PS00005 


346->349 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


373->376 


PKC~PHOSPHO" 


"site 


PDOC00005 


PS00005 


469->472 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


474->477 


PKC PHOSPHO" 


'site 


PDOC00005 


PS00005 


485->488 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


494->497 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


2->6 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


17->21 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


47->51 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


64->68 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


66->70 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


70->74 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


72->76 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


74->78 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


84->88 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


144->148 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


206->210 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


215->219 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


250->254 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


271->275 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


273->277 


CK2 PHOSPHO - 


"site 


PDOC00006 


PS00006 


340->344 


CK2 PHOSPHO 


site 


PDOC00006 


PS00006 


369->373 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


426->430 


CK2~PHOSPH0 


"site 


PDOC00006 


PS00007 


434->442 


TYR_PHOSPHO~ 


"site 


PDOC00007 


PS00007 


152->161 


TYR~PHOSPHO~ 


'site 


PDOC00007 


PS00008 


15->21 


MYRISTYL 




PDOC00008 


PS00008 


96->102 


MYRISTYL 




PDOC00008 


PS00008 


115->121 


MYRISTYL 




PDOC00008 


PS00008 


130->136 


MYRISTYL 




PDOC00008 


PS00008 


154->160 


MYRISTYL 




PDOC00008 


PS00008 


229->235 


MYRISTYL 




PDOC00008 


PS00008 


244->250 


MYRISTYL 




PDOC00008 


PS00008 


289->295 


MYRISTYL 




PDOC00008 


PS00008 


362->368 


MYRISTYL 




PDOC00008 


PS00009 


253->257 


AMI DAT I ON 




PDOC00009 



(No Pfam data available for DKFZphutel_17k7.3) 
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DKF2phutel_18cl2 



group: uterus derived 

DKFZphutel_18cl2 encodes a novel 378 amino acid protein nearly identical to human 
WUGSC :H_DJ08 72 F07.1 protein. 

The novel protein has an additional N-terminal domain, which is not present in 
WUGSC : H_DJ0872F07 . 1 . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes. 



nearly identical to human WUGSC : H_DJ0872F07 . 1 protein 

on genomic level encoded by AC004537, 10 exons the predicted 
protein sequence AC004537_1 is only partialy o.k. first exon wasn't 
predicted there are additional exons predicted 
(BLASTX/EST-BLAST shows that the cDNA is only party spliced) 
intron -1216-3540//-3577-5059 

Sequenced by AGOWA 

Locus: map»"7q31 w 

Insert length: 6005 bp 

Poly A stretch at pos. 5980, polyadenylation signal at pos. 5968 



1 AGCGGGTGCT GCTAGCGGAG GCGCCATATT GGAGGGGACA AAACTCCGGC 
51 GACAGCGAGT GACACAAATA AACCCCTGGA CCCCCTTGTT CCCTCAGCTC 
101 TAAGGGCCGC GATGTTGTAC CTAGAAGACT ATCTGGAAAT GATTGAGCAG 
151 CTTCCTATGG ATCTGCGGGA CCGCTTCACG GAAATGCGCG AGATGGACCT 
201 GCAGGTGCAG AATGCAATGG ATCAACTAGA ACAAAGAGTC AGTGAATTCT 
251 TTATGAATGC AAAGAAAAAT AAACCTGAGT GGAGGGAAGA GCAAATGGCA 
301 TCCATCAAAA AAGACTACTA TAAAGCTTTG GAAGATGCAG ATGAGAAGGT 
351 TCAGTTGGCA AACCAGATAT ATGACTTGGT AGATCGACAC TTGAGAAAGC 
4 01 TGGATCAGGA ACTGGCTAAG TTTAAAATGG AGCTGGAAGC TGATAATGCT 
4 51 GGAATTACAG AAATATTAGA GAGGCGATCT TTGGAATTAG ACACTCCTTC 
501 ACAGCCAGTG AACAATCACC ATGCTCATTC ACATACTCCA GTGGAAAAAA 
551 GGAAATATAA TCCAACTTCT CACCATACGA CAACAGATCA TATTCCTGAA 
601 AAGAAATTTA AATCTGAAGC TCTTCTATCC ACCCTTACGT CAGATGCCTC 
651 TAAGGAAAAT ACACTAGGTT GTCGAAATAA TAATTCCACA GCCTCTTCTA 
701 ACAATGCCTA CAATGTGAAT TCCTCCCAAC CTCTGGGATC CTATAACATT 
751 GGCTCGTTAT CTTCAGGAAC TGGTGCAGGG GCAATTACCA TGGCAGCTGC 
801 TCAAGCAGTT CAGGCTACAG CTCAGATGAA GGAGGGACGA AGAACATCAA 
851 GTTTAAAAGC CAGTTATGAA GCATTTAAGA ATAATGACTT TCAGTTGGGA 
901 AAAGAATTTT CAATGGCCAG GGAAACAGTT GGCTATTCAT CATCTTCGGC 
951 ACTTATGACA ACATTAACAC AGAATGCCAG TTCATCAGCA GCCGACTCAC 
1001 GGAGTGGTCG AAAGAGCAAA AACAACAACA AGTCTTCAAG CCAGCAGTCA 
1051 TCATCTTCCT CCTCCTCTTC TTCCTTATCA TCGTGTTCTT CATCATCAAC 
1101 TGTTGTACAA GAAATCTCTC AACAAACAAC TGTAGTGCCA GAATCTGATT 
1151 CAAATAGTCA GGTTGATTGG ACTTACGACC CAAATGAACC TCGATACTGC 
1201 ATTTGTAATC AGGTAAAAGT CTGTTATATC TATAAAAGTA TAATCTGAAT 
1251 AAACTAGAAG GAAGAGAACT ATTTCATTTT TAAGCACTTT TTTAAACTCA 
1301 CTTAAAATAC CTTTGCTTTA TTTGTATACT TTTCTCCCCC TTCTTACAAA 
1351 AGTGACATTT GCTGTAAATA CTGAGTATAA AGAAAAATGT TACCCATAAT 
1401 CCTAGCCCTC AGATACAACC TGTAACTAAA CATTTTTGGT ATACCACTAC 
1451 CATATACCTC ATGTGCACAT TGGCTGCCTT AATAAAATAC AACAGACTGG 
1501 GTAGCTTAAA CAACAGAAAA TAATTTTCTC ACAGGTATGA AGGCTGGGAA 
1551 GTCCAAGATC AAGGTGTCCA CTGACTCAGT TCTGGAGGAG GGCTCCCTTC 
1601 CTAGATGGAG ACTGCTGCCT TCTCACCGGG TCCTCACATG ATAGAGGGAG 
1651 AAAGAGTGTG CTCTGGTGTC TTTTCTTATA AGGGCACCAG CCTTGTCAGA 
1701 GTAGGACCCC ACTCTATGAC CTCATTTAAC CTTTACCACC TCCTCACAGG 
1751 CCCTGTTTCC AATTATAGTC ACGTTGGGGG TTAGGGCTTC AACATATGAT 
1801 TTTGAGACAT AAGCTTGCAT TTCATAACAC GTGTCTATGC AGATTTGCAC 
1851 ATGCATGTGT GTATAAGTTT GTCAGTAGGA ACCACAGTGT ATACTTTCTT 
1901 GTTACTGGCT TTTTTCTCTA AATCAGGTAT ACCGAACATG ATTTTTCTTT 
1951 AAGATCATAT TTTTAATTTT CACATAGTTA TCTCTTATGC CATCCAGTGT 
2001 AGTTTTCTTA ACCAATACCT AGCTATAGAT TATATTAGTG GTTTTAATTT 
2051 GTTTGAAATT AGGGATAATA TTACGATAGG CATTTTTTAA ATGTAATCCA 
2101 TTTTATACAT CTAATTTCTT GGATAATCTT TTAGAAATAA AATTAGGCTG 
2151 TAAATATTTG AC AG AC AC C A AAATATATTT TCTAGAAATT TATTACCAAA 
2201 AATTAATAAA CATACCGGTT TACTAAACCC TGTCCAACAC TGGATATTAT 
2251 TTTCTTTTAA AAACTAAGTA CCAATTTGGT AGTTTTATAT TATGATTGTT 
2301 TTAAATACAC TAGTATTATT GAAGTTGGAC ATTTTTTGAC CATTTTTGTT 
2351 TTTTACATTA TGAATCGACT CCTAATGGTG TCGGCTGATT TTTCTATTGT 
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2401 TTTTGTTATG TACTCTAAAT ATTTGCTTGA TTTAGTTTTT TAAAAATAAT 

2451 TCTAAAATTT TAATTTTATG TAGTTATGAC TGTTAATTTT TTTTTATGAA 

2501 GCAAGCCATG GATTATATAC TTAGAAGGGC TTTCTCTTTG GCTCTTCTTT 

2551 CTACAAAAAA TTGTCTTGTA TAATATTTTC TCCTAGTTTT TATATGGTTT 

2601 TGTCTAGTTC TTTGCATGCT TCAGTTTCTT CACATTTAAG ACTTAGTCTA 

2651 TCAGCAGATT ATTGTGTCTA ACAGTATGAG TTGCCAGTCT GATTTTTAAA 

2701 AATTTTAACA ATTTGTTAGC TGTTCCACTA TCACCCGATA AACATTTTTC 

2751 AGTACAAATG ATAGAAAAGC ATATCCTGTA TCCTGACAAC AAAAGTAGAT 

2801 TACT TGC AAA AGAACAAAAT CAGACTGAAC CTAGAGTTTT CCTCTGTAAC 

2851 ACTAAAAAAC TAGAAGGTGA TGGAATATGT CTGTAGAGCT TTCAGGGAAA 

2901 AATTAAGAGC CCCCAAAAAC TTGATATTCA GAGAAGTTAT TTCTCTGCAT 

2951 AGGACCATGT AAATATATTT TCACTCATGC AGAGAATCAG AAGATATGCC 

3001 ATCTAGTTAA TCCTGTCTGA AAAATTATTC AATCCACTGA GAACTTCAGT 

3051 GAACTCAAGA ATTAGCAAGT TATGCCCTAA AGTGCTGGTG ATGAAGAGCA 

3101 AAAGAAAAAT GAGAAAGGAC ATAAAATAGA TAAGTTTAGA AGTTTCAAGG 

3151 AAGGAGACTA TTAATTGCAA AAATATATAT GACCTAATGT GACCCAAGAA 

3201 GTAAAAACTT TCAGTAAGTA AATAATCAAG AAAGGAACTT AAAATTTTTA 

3251 CAATAAGAAC TACCCAGAAA GATGACTCCT TCATCCGGGT GATTTATATG 

3301 TCAAGTTCTT CCAGACTTCT GAAGGGCAGA TAATTCCTGT GCATTTCTTC 

3351 CCACCCTTGC CCCACCCTGC CCAAAAGAGT ATTTCAGGAA AAAATTATTA 

3401 TACCTTGATT CTCAATGTAA TTGTATATTC AGTGTATTTC CCTTTATTTT 

3451 CCAGCAGTAT CATACATAAA CAGTTAATTG GTATCTAGGT GTTTGTTACA 

3501 TAGTCATAAT AAAGACATTT AATTTTTTTT AACTAGGTAT CTTATGGTGA 

3551 GATGGTGGGA TGTGATAACC AAGATGTAAG TATTACATTT TTCTATTTAG 

3601 GAATGAAAAA AATCACAGGT TGTTATTACT TGAATATTTG TCTTATTTGC 

3651 TGTATGGTTT GGTCTAAGAA AACAGGTTTG CAGGTATATT AGTTATGTTA 

3701 TGCTAATGCT AGAATATTCC TCTTCAAAAT AGGGTAGTGT CCCTTAATGT 

3751 GTTCCCTATT TTAATTTTTA AAGCTAATTT TATGGTTTTA TGTGCAGATT 

3801 GTCTCAGAAG TGTTATGTTG TATGAAAATT ATAAATACCC TCCTTTCCCT 

3851 TTACTAAAAA ATACTGTGTT TACTAGAATC CAGTTCATTT ATCACATTGA 

3901 AGAAATGGAA TTTTAAAACA ATTCATTCTT TCAGGCTGCA CCGTGCTAAA 

3951 GTGAAGGGTG GGATAATTGA GGATCTAATG TGAGATTATC TTCCTCTCAT 

4001 GAGTATAATA TTTTTTCCTG TACTCTGCAG GTGTCAGCTG ATAAGAGCCA 

4051 CCCCTGATCT AAAAAGTAAA GGAAATTTGA AAGGAAGGAA TTCTTGGTTT 

4101 TTAGGAGACT TAATTTTAGT TAGAGATACG TTTTTTATTC AATACTGAGA 

4151 ATATTGTTGT CTAGTAATTT TGACTCCCTC CTTATTTAGT AGTGACAGGA 

4201 TCCTAAGATT AACAAGAGTT TTAAATTTGT AAAACAATCT GAAGATTGAG 

4251 GGAGCTGGCT AGGTGCATTA AAATGTGTAC TTTTCCTAGA CCTGATAGGG 

4301 TTACAGCAAC ATGCTCACGT AGATTGGGAC AGAGCCTCCT TCTGTTTCCC 

4351 TGTCTAGAAT CCCTTGTAGG CTGTTTGTGG TTGTTGCAAA AACAATATTG 

4401 CCCAACCATT TCAAGAACAT CACTGTAAAC TCTTCTGGGG CAGTTAGTGA 

44 51 AAATGATGAA TGAGATTTCT ATGAGTACCA GCATCATGCT TCTCTGATTC 

4501 TTCTTATTCC CAGTTGTGCT CTTCTGAGTG CTAAGACTTT CATGAAAGAG 

4551 TTTTCTGCTT AATATGTTTC AAAGAGGAAT AATTTTTCTC TACATTTCAA 

4601 GGAATAGAAA CACCCACGTA GGAAATGCAG GGCATAAGAC ATAAATTAAT 

4 651 GTCTTTAATT ACAATCAGCT TATTCTACTT TATGAGACAG CAAATAAGGC 

4701 TGACTATTAA ATAAAATCTT AAGTTATATT TACCTTCTAC ATAGAAGATT 

4751 CATCCCACTT CTTTTTGCCC TTGAAAGCTG AAAACTAGTG AATTTTCATT 

4801 CATTAGGATG AGGGGACTAG ATTACATGGA CCTCAGGATT CTTGAAGATG 

4851 CATAATTTTT CTGTGCCTTC ATTTCCTCAT TCCTGAAGCT TATCATTTAG 

4901 TCTAAATGAT GTCTAAATAA TCTAGATCTA AAAATTCTGA TGTCACACAT 

4951 CTAATTATTG TTAAATTAAA TGGATTATTC AGTCTCCTGA GCATATTTTA 

5001 ATATACTCTC TTGTCTTCAG AAGTACTGAA AACTTGTTTT TTGCAATTTT 

5051 GCTTTCTAGT GCCCTATAGA ATGGTTCCAT TATGGCTGCG TTGGATTGAC 

5101 AGAGGCACCA AAAGGCAAAT GGTACTGTCC ACAGTGCACT GCTGCAATGA 

5151 AGAGAAGAGG CAGCAGACAC AAATAAAGGT GGTCCTTTTG TTTGATGAAG 

5201 AAATAAACTT CAGCTGAAGA TTTTATATAG GACTTTAAAA AGAAGAGAAG 

5251 AGAAAGAAGA AACAATGCAT TTCCAGGCAA CCACTTAAAG GATTTACATA 

5301 GACAATCCTA TAAGATCTTG AACTTGAATT TTATGGGTTG TATTTTAATA 

5351 ATGTAAGTAA ATTATTTATG CACTCCTGGT GTGCTATGAA TATTATTCCA 

5401 GTTAGCCTTG GATTATTTCA GTGGCCAACA TATGCAGACA TTTGTACTCC 

5451 TCAACCATTT TCTCAAAGTA ATGGGCATTC TATGATTTAG ACTTCAAGGA 

5501 ATTCCAATGA TGAAGATTTT AAGGAAAGTA TTTTATATTC AACAGGTATA 

5551 TTCTGCTGCA TGTACTGTAC TCCAGAGCTG TTATGTAACA CTGTATATAA 

5601 ATGGTTGCAA AAAAAAAAAA AAGTCAGTGC TTCTAAAAAG AATTTAAGAT 

5651 AATGGTTTTT AAAATGCCTT TATAATAAGC TTTGTTTCTT TGTGAAACTA 

5701 ATTCAGCAGG CTGAAGGAAA TGGTTCATGT GATAATGTGG GCTGGTATCC 

5751 TCTAGAGTAC CTGGGTACAT AAACAGAAAC TCCTGTAGGT AAAAAGTAAT 

5801 TTGTGCCATT AGTCTTTCTA TGTTTCTGCA TCCAGATAGA GTGCAGTTCA 

5851 TGAGGGAGGG GGCGGGGGAC TGAAGGGGAA AGGGCGTTAA AGTGATACAT 

5901 TTTTATACCA AATGTGTTTA TTTTTTTGTG CAAGTAATCC TTAAAATTGC 

5951 AATTGTATTA GGTGTTAAAA TAAAGTTTTT AAAAAATTAA AAAAAAAAAA 
6001 AAAAA 



BLAST Results 



Entry HSG20547 from database EMBL: 
HSG20547I human STS A005W09. 
Length - 154 
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Minus Strand HSPs: 

Score - 770 (115.5 bits), Expect « 2.9e-26, P » 2,9e-26 
Identities = 154/154 (100%) 



Medline entries 



98101645: 

The candidate tumour suppressor p33lNGl cooperates with p53 in cell 
growth control. 



Peptide information for frame 1 



ORF from 112 bp to 1245 bp; peptide length: 378 
Category: similarity to known protein 



1 MLYLEDYLEM IEQLPMDLRD RFTEMREMDL QVQNAMDQLE QRVSEFFMNA 
51 KKNKPEWREE QMASIKKDYY KALE DADE KV QLANQIYDLV DRHLRKLDQE 
101 LAKFKMELEA DNAGITEILE RRSLELDTPS QPVNNHHAHS HTPVEKRKYN 
151 PTSHHTTTDH IPEKKFKSEA LLSTLTSDAS KENTLGCRNN NSTASSNNAY 
201 NVNSSQPLGS YNIGSLSSGT GAGAITMAAA QAVQATAQMK EGRRTSSLKA 
251 SYEAFKNKDF QLGKEFSMAR ETVGYSSSSA LMTTLTQNAS SSAADSRSGR 
301 KSKNNNKSSS QQSSSSSSSS SLSSCSSSST VVQEISQQTT WPESDSNSQ 
351 VDWTYDPNEP RYCICNQVKV CYIYKSII 

BLASTP hits 

Entry AF044076_1 from database TREMBL: 

"ING1"; product: "candidate tumor suppressor p33lNGl"; Homo 
sapiens candidate tumor suppressor p33lNGl (INGl) mRNA, complete 
cds. Homo sapiens (human) 
Length « 279 

Score = 162 (57.0 bits), Expect - l.le-09, P = l.le-09 
Identities = 48/183 (26%), Positives = 92/183 (50%) 

Entry AC004 537_1 from database TREMBL: 

gene: "WUGSC : H_DJ0872F07 . 1" ; Homo sapiens PAC clone DJ0872F07 from 
7q31, complete sequence. 

Score = 1814, P - 3.7e-187, identities =* 358/358, positives = 358/358 
Entry CEY51H1A_1 from database TREMBL: 

gene: "Y51H1A. 4\- Caenorhabditis elegans cosmid Y51H1A 

Score = 213, P = 3.7e-15, identities = 37/123, positives = 82/123 



Alert BLASTP hits for DKFZphutel_18cl2, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_18cl2, frame 1 



Report for DKF2phutel_18cl2 . 1 



378 

42275.72 
5.72 

TREMBL :AC004537_1 gene: "WUGSC : H__DJ0872F07 . 1"; Homo sapiens PAC clone DJ0872F07 
complete sequence, le-157 

99 unclassified proteins (S. cerevisiae, YHR090cJ 8e-05 

04.05.01.04 transcriptional control [S. cerevisiae, YNL097c] 2e-04 

MYRISTYL 3 

AMIDATION 2 

CAMP_PHOSPHO_SITE 1 

CK2 PHOSPHO_SITE 4 

PROKAR_LI POPROTEIN 1 

GLYCOSAMINOGLYCAN 1 

PKC PHOSPHO_SITE 3 

ASNJ3LYCOSYLATION 5 

All Alpha 

LOW~COMPLEXITY 20.63 % 
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[KW] 



COILED COIL 



7.94 % 



SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 



MLYLEDYLEMIEQLPMDLRDRFTEMREMDLQVQNAMDQLEQRVSEFFMNAKKNKPEWREE 
ccchhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhh 

QMASIKKDYYKALEDADEKVQLANQIYDLVDRHLRKLDQELAKFKMELEADNAGITEILE 

hhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccchhhhh 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

RRSLELDTPSQPVNNHHAHSHTPVEKRKYNPTSHHTTTDHIPEKKFKSEALLSTLTSDAS 

hhccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhcccc 



KENTLGCRNNNSTASSNNAYNVNSSQPLGSYNIGSLSSGTGAGAITMAAAQAVQATAQMK 

xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx . . 

cccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhh 



SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 



EGRRTSSLKASYEAFKNNDFQLGKEFSMARETVGYSSSSALMTTLTQNASSSAADSRSGR 

xxxxxxxxxxxx 

hccccccccchhhhhhccccccccccccccccccccccceeeeecccccccccccccccc 



KSKNNNKSSSQQSSSSSSSSSLSSCSSSSTVVQEISQQTTVVPESDSNSQVDWTYDPNEP 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

ccccccccccccccccccccceeecccccccccccccccccccccccccceeeecccccc 



SEQ 
SEG 
PRD 
COILS 



RYCICNQVKVCYIYKSII 
eeeeceeeeeeeeeeccc 



Prosite for DKFZphutel_18cl2 . 1 



PS00001 


190- 


•>194 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


191- 


->195 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


203- 


■>207 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


288- 


■>292 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


306- 


■>310 


ASN GLYCOSYLATION 


PDOC00001 


PS00002 


218->222 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00004 


243- 


>247 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


64->67 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


247- 


•>250 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


298- 


■>301 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


142- 


•>146 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


156- 


■>160 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


292- 


>296 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


349- 


•>353 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00008 


186- 


■>192 


MYRISTYL 


PDOC00008 


PS00008 


214- 


>220 


MYRISTYL 


PDOC00008 


PS00008 


219- 


>225 


MYRISTYL 


PDOC00008 


PS00009 


241- 


■>245 


AMI DAT I ON 


PDOC00009 


PS00009 


298- 


■>302 


AMI DAT I ON 


PDOC00009 


PS00013 


315- 


■>326 


PROKAR LIPOPROTEIN 


PDOC00013 



(No Pfam data available for DKFZphutel_18cl2.1) 
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DKFZphutel_18il9 



group: transcription factors 

DKFZphutel_18il9 encodes a novel 759 amino acid protein with similarity to the SREBP-2 mutant 
sterol regulatory element binding protein-2 of Cricetulus griseus. 

The SREBP-2 protein is embedded in the membranes of the nucleus and endoplasmic reticulum. In 
cholesterol -depleted cells the proteins are cleaved to release soluble NH2-terminal fragments 
that enter the nucleus and activate genes encoding the low density lipoprotein receptor and 
enzymes of cholesterol synthesis. The new protein is a putative transcription factor capable 
of protein-protein interaction via a lim domain and additionally shows similarity to the 
common sunflower transcription factor SF3. 

The new protein can find application in modulating/blocking the expression of genes involved 
in lipid metabolism. 



similarity to transcription factor SF3 

complete cDNA, complete cds, EST hits 

strong similarity to mutated SREBP-2 of hamster, 

similarity is not to SREP-2 part of protein but to the unknown part of 
the fusion protein 

Sequenced by AGOWA 

Locus: /map=12 

Insert length: 3664 bp 

Poly A stretch at pos. 3647, polyadenylation signal at pos . 3636 



1 GCGCTAGGTA GAGCGCCGGG ACCTGTGACA GGGCTGGTAG CAGCGCAGAG 
51 GAAAGGCGGC TTTTAGCCAG GTATTTCAGT GTCTGTAGAC AAGATGGAAT 
101 CATCTCCATT TAATAGACGG CAATGGACCT CACTATCATT GAGGGTAACA 
151 GCCAAAGAAC TTTCTCTTGT CAACAAGAAC AAGTCATCGG CTATTGTGGA 
201 AATATTCTCC AAGTACCAGA AAGCAGCTGA AGAAACAAAC ATGGAGAAGA 
251 AGAGAAGTAA CACCGAAAAT CTCTCCCAGC ACTTTAGAAA GGGGACCCTG 
301 ACTGTGTTAA AGAAGAAGTG GGAGAACCCA GGGCTGGGAG CAGAGTCTCA 
351 CACAGACTCT CTACGGAACA GCAGCACTGA GATTAGGCAC AGAGCAGACC 
401 ATCCTCCTGC TGAAGTGACA AGCCACGCTG CTTCTGGAGC CAAAGCTGAC 
4 51 CAAGAAGAAC AAATCCACCC CAGATCTAGA CTCAGGTCAC CTCCTGAAGC 
501 CCTCGTTCAG GGTCGATATC CCCACATCAA GGACGGTGAG GATCTTAAAG 
551 ACCACTCAAC AGAAAGTAAA AAAATGGAAA ATTGTCTAGG AGAATCCAGG 
601 CATGAAGTAG AAAAATCAGA AATCAGTGAA AACACAGATG CTTCGGGCAA 
651 AATAGAGAAA TATAATGTTC CGCTGAACAG GCTTAAGATG ATGTTTGAGA 
701 AAGGTGAACC AACTCAAACT AAGATTCTCC GGGCCCAAAG CCGAAGTGCA 
751 AGTGGAAGGA AGATCTCTGA AAACAGCTAT TCTCTAGATG ACCTGGAAAT 
801 AGGCCCAGGT CAGTTGTCAT CTTCTACATT TGACTCGGAG AAAAATGAGA 
851 GTAGACGAAA TCTGGAACTT CCACGCCTCT CAGAAACCTC TATAAAGGAT 
901 CGAATGGCCA AGTACCAGGC AGCTGTGTCC AAACAAAGCA GCTCAACCAA 
951 CTATACAAAT GAGCTGAAAG CCAGTGGTGG CGAAATCAAA ATTCATAAAA 
1001 TGGAGCAAAA GGAGAATGTG CCCCCAGGTC CTGAGGTCTG CATCACCCAT 
1051 CAGGAAGGGG AAAAGATTTC TGCAAATGAG AATAGCCTGG CAGTCCGTTC 
1101 CACCCCTGCC GAAGATGACT CCCGTGACTC CCAGGTTAAG AGTGAGGTTC 
1151 AACAGCCTGT CCATCCCAAG CCACTAAGTC CAGATTCCAG AGCCTCCAGT 
1201 CTTTCTGAAA GTTCTCCTCC CAAAGCAATG AAGAAGTTTC AGGCACCTGC 
1251 AAGAGAGACC TGCGTGGAAT GTCAGAAGAC AGTCTATCCA ATGGAGCGTC 
1301 TCTTGGCCAA CCAGCAGGTG TTTCACATCA GCTGCTTCCG TTGCTCCTAT 
1351 TGCAACAACA AACTCAGTCT AGGAACATAT GCATCTTTAC ATGGAAGAAT 
1401 CTATTGTAAG CCTCACTTCA ATCAACTCTT TAAATCTAAG GGCAACTATG 
14 51 ATGAAGGCTT TGGGCACAGA CCACACAAGG ATCTATGGGC AAGCAAAAAT 
1501 GAAAACGAAG AGATTTTGGA GAGACCAGCC CAGCTTGCAA ATGCAAGGGA 
1551 GACCCCTCAC AGCCCAGGGG TAG A AG AT GC CCCTATTGCT AAGGTGGGTG 
1601 TCCTGGCTGC AAGTATGGAA GCCAAGGCCT CCTCTCAGCA GGAGAAGGAA 
1651 GACAAGCCAG CTGAAACCAA GAAGCTGAGG ATCGCCTGGC CACCCCCCAC 
1701 TGAACTTGGA AGTTCAGGAA GTGCCTTGGA GGAAGGGATC AAAATGTCAA 
1751 AGCCCAAATG GCCTCCTGAA GACGAAATCA GCAAGCCCGA AGTTCCTGAG 
1801 GATGTCGATC TAGATCTGAA GAAGCTAAGA CGATCTTCTT CACTGAAGGA 
1851 AAGAAGCCGC CCATTCACTG TAGCAGCTTC ATTTCAAAGC ACCTCTGTCA 
1901 AGAGCCCAAA AACTGTGTCC CCACCTATCA GGAAAGGCTG GAGCATGTCA 
1951 GAGCAGAGTG AAGAGTCTGT GGGTGGAAGA GTTGCAGAAA GGAAACAAGT 
2001 GGAAAATGCC AAGGCTTCTA AGAAGAATGG GAATGTGGGA AAAACAACCT 
2051 GGCAAAACAA AGAATCTAAA GGAGAGACAG GGAAGAGAAG TAAGGAAGGT 
2101 CATAGTTTGG AGATGGAGAA TGAGAATCTT GTAGAAAATG GTGCAGACTC 
2151 CGATGAAGAT GATAACAGCT TCCTCAAACA ACAATCTCCA CAAGAACCCA 
2201 AGTCTCTGAA TTGGTCGAGT TTTGTAGACA ACACCTTTGC TGAAGAATTC 
2251 ACTACTCAGA ATCAGAAATC CCAGGATGTG GAACTCTGGG AGGGAGAAGT 
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2301 GGTCAAAGAG CTCTCTGTGG 

2351 ATGAGGATGA GGATGAAGAG 

2401 TTCATGTTAG TGTTAGCGAG 

2451 TAAGCAGGTA TCCCAGCATG 

2501 AAAGAATTCC TTCTTAAAAT 

2551 CATTCTAAAT ACTAGAGATA 

2601 ATGATATGCG TAAGTGCTGT 

2651 GATAATAGCC CAGATTCTAC 

2701 T AG AT GAT T A GTAGTATATT 

2751 ACAGAAGGAA TTTAGGGGCT 

2801 AAAGGGCACA GTTTGTATAT 

2851 TATTTACCTG TTAAGAGATT 

2901 TCTTGCTGTG ATATATATGA 

2951 AACTACATCC TGAACTCGAC 

3001 TTGAGGCAAT TGAAAAACCA 

3051 GCTGTCTCCC AAATAAGCTT 

3101 AAATGATTGC TTTCTTTTCT 

3151 AAGCTGCAAT ATTTTAGTAA 

3201 GTGTTAGAGC AAAGTGAAGA 

3251 TACACCACTT GAGCTCAGAC 

3301 CCCTTTTTGA GACACTAATT 

3351 GATTTTTATC ACAGTATTCT 

3401 TTTCTTGGGA TGATTTTCTA 

3451 AGTACATTTG TTGTACACAG 

3501 AGAGGTGTCT TAAGCTGTAG 

3551 TAGCTTTAAT ATTTTTTAGA 

3601 CCTAGTCTGA AACATTTTTA 

3651 AAAAAAAAAA AAAA 



AAGAACAGAT AAAGAGAAAT CGGTATTATG 
TGACAAATTG CAATGATGCT GGGCCTTAAA 
CCACTGCCCT TTGTCAAAAT GTGATGCACA 
AAATGTAATT TACTTGGAAG TAACTTTGGA 
CAAAAACAAA ACAAAAAAAC ACAAAAAACA 
ACTTTACTTA AATTCTTCAT TTTAGCAGTG 
AAGGCTTGTA ACTGGGGAAA TATTCCACCT 
TGTATTCCCA AAAGGCAATA TTAAGGTAGA 
GTTACACACT ATTTTGGAAT TAGAGAACAT 
TAAACATTAC GACTGAATGC ACTTTAGTAT 
TTTTAAATGA ATACCAATTT AATTTTTTAG 
ATTTAGTCTT TAAATTTTTT AGGTTAATTT 
GGAATTTACT ACTTTATGTC CTGCTCTCTA 
GTCCTGAGGT ATAATACAAC AGAGCACTTT 
ACCTACACTC TTCGGTGCTT AGAGAGATCT 
TTGTATCTGC CAGTGAATTT ACTGTACTCC 
GGTGATATCT GTGCTTCTCA TAATTACTGA 
TACCTTCGGG ATCACTGTCC CCCATCTTCC 
GTTTAAAGGA GGAAGAAGAA AGAACTGTCT 
CTCTAAACCC TGTATTTCCC TTATGATGTC 
TTTAAATACT TACTAGCTCT GAAATATATT 
CAGGGTGAAA TTAAACCAAC TATAGGCCTT 
GTCTTAAGGT TTGGGGACAT TATAAACTTG 
TTGATATTCC AAATTGTATG GATGGGAGGG 
GCTTTTCTTT GTACTGCATT TATAGAGATT 
GATGTAAAAC ATTCTGCTTT CTTAGTCTTA 
TTCAATAAAG ATTTTAATTA AAATTTGAAA 



BLAST Results 



Entry HS512217 from database EMBL: 
human STS SHGC-14654. 
Length - 250 
Minus Strand HSPs: 

Score = 1202 (180.3 bits), Expect = 1.8e-46, P « 1.8e-46 
Identities - 242/244 (99%) 



Medline entries 



95263566: 

Three different rearrangements in a single intron truncate 
sterol regulatory element binding protein-2 and produce 
sterol-resistant phenotype in three cell lines. Role of introns 
in protein evolution. 

93258417: 

Characterization of a pollen-specific cDNA from sunflower 
encoding a zinc finger protein. 



Peptide information for frame 1 



ORF from 94 bp to 2370 bp; peptide length: 759 
Category: similarity to known protein 



1 MESSPFNRRQ WTSLSLRVTA KELSLVNKNK SSAIVEIFSK YQKAAEETNM 

51 EKKRSNTENL SQHFRKGTLT VLKKKWENPG LGAESHTDSL RNSSTEIRHR 

101 ADHPPAEVTS HAASGAKADQ EEQIHPRSRL RSPPEALVQG RYPHIKDGED 

151 LKDHSTESKK MENCLGESRH EVEKSEISEN TDASGKIEKY NVPLNRLKMM 

201 FEKGEPTQTK ILRAQSRSAS GRKISENSYS LDDLEIGPGQ LSSSTFDSEK 

251 NESRRKLELP RLSETSIKDR MAKYQAAVSK QSSSTNYTNE LKASGGEIKI 

301 HKMEQKENVP PGPEVCITHQ EGEKISANEN SLAVRSTPAE DDSRDSQVKS 

351 EVQQPVHPKP LSPDSRASSL SESSPPKAMK KFQAPARETC VECQKTVYPM 

401 ERLLANQQVF HISCFRCSYC NNKLSLGTYA SLHGRIYCKP HFNQLFKSKG 

451 NYDEGFGHRP HKDLWASKNE NEEILERPAQ LANARETPHS PGVEDAPIAK 

501 VGVLAASMEA KASSQQEKED KPAETKKLRI AWPPPTELGS SGSALEEGIK 

551 MSKPKWPPED EISKPEVPED VDLDLKKLRR SSSLKERSRP FTVAASFQST 

601 SVKSPKTVSP PIRKGWSMSE QSEESVGGRV AERKQVENAK ASKKNGNVGK 

651 TTWQNKESKG ETGKRSKEGH SLEMENENLV ENGADSDEDD NSFLKQQSPQ 

701 EPKSLNWSSF VDNTFAEEFT TQNQKSQDVE LWEGEVVKEL SVEEQIKRNR 
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751 YYDEDEDEE 

BLAST P hits 
Entry CG22818_1 from database TREMBL: 

"SREBP-2"; product: "mutant sterol regulatory element binding 
protein-2"; Cricetulus griseus SRD-2 mutant sterol regulatory 
element binding protein-2 (SREBP-2) mRNA, complete cds. Cricetulus 
griseus (Chinese hamster) 
Length « 839 

Score = 1502 (528.7 bits), Expect - 3.9e-154, P - 3.9e-154 
Identities - 290/380 (76%), Positives - 322/380 (84%) 

Entry S28507 from database PIR: 
transcription factor SF3 - common sunflower 
Length - 219 

Score = 212 (74.6 bits), Expect » 6.3e-l8, Sum P(2) = 6.3e-18 
Identities = 36/82 (43%), Positives « 55/82 (67%) 

Entry NTLIMDOM_l from database TREMBL: 

"SF3"; product: "LIM-domain SF3 protein"; N.tabacum mRNA for 
LIM-domain protein Nicotiana tabacum (common tobacco) 
Length = 189 

Score - 216 (76.0 bits), Expect - 1.0e-16, P = 1.0e-16 
Identities = 42/94 (44%), Positives = 57/94 (60%) 



Alert BLAST P hits for DKFZphutel_18il9, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_18il9, frame 1 



Report for DKFZphutel_18il9 . 1 



[LENGTH] 759 

[MW] 85225.57 

[pi] 6.41 

( HOMOL ] TREMBL : CG228 1 8_1 gene: "SREBP-2"; product: "mutant sterol regulatory element 

binding protein-2"; Cricetulus griseus SRD-2 mutant sterol regulatory element binding prote 
2 (SREBP-2) mRNA, complete cds. le-151 



(FUNCATJ 


99 unclassified proteins [S. cerevisiae, YLR257w] 3e-05 


[FUNCAT] 


05.04 translation (initiation, elongation and termination) [S. cerevisiae, 


YGR162w TIF4 631 - mRNA cap-binding protein] le-04 


[FUNCAT) 


30.03 organization of cytoplasm [S. cerevisiae, YGR162w TIF4631 - mRNA 


cap-binding protein] le-04 


[BLOCKS] 


BL00478B 


[PIRKW] 


zinc finger 9e-16 


[PIRKW] 


dna binding 9e-16 


[SUPFAM] 


LIM metal-binding repeat homology 9e-16 


[PROSITE] 


MYRISTYL 6 


[PROSITE] 


LIM DOMAIN 1 1 


[PROSITE] 


AMI DAT ION " 2 


[PROSITE] 


CAMP PHOSPHO SITE 4 


[PROSITE] 


CK2 PHOSPHO SITE 28 


[PROSITE] 


TYR PHOSPHO SITE 2 


[PROSITE] 


PKC PHOSPHO_SITE 15 


[PROSITE] 


A S N_G LYCOS YLAT I ON 6 


[PFAM] 


LIM domain containing proteins 


[KW] 


Irregular 


[KW] 


3D 


tKW] 


L0W_COMPLEXITY 5.53 % 



SEQ MESSPFNRRQWTSLSLRVTAKELSLVNKNKSSAIVEIFSKYQKAAEETNMEKKRSNTENL 

SEG 

lctl- 

SEQ SQHFRKGTLTVLKKKWENPGLGAESHTDSLRNSSTEIRHRADHPPAEVTSHAASGAKADQ 

SEG 

lctl- 

SEQ EEQIHPRSRLRSPPEALVQGRYPHIKDGEDLKDHSTESKKMENCLGESRHEVEKSEISEN 

SEG 

lctl- 

SEQ TDASGKIEKYNVPLNRLKMMFEKGEPTQTKILRAQSRSASGRKISENSYSLDDLEIGPGQ 

SEG 
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Ictl- 

SEQ LSSSTFDSEKNESRRNLELPRLSETSIKDRMAKYQAAVSKQSSSTNYTNELKASGGEIKI 

SEG 

Ictl- 

SEQ HKMEQKENVPPGPEVCITHQEGEKISANENSLAVRSTPAEDDSRDSQVKSEVQQPVHPKP 

SEG x 

Ictl- 

SEQ LSPDSRASSLSESSPPKAMKKFQAPARETCVECQKTVYPMERLLANQQVFHISCFRCSYC 

SEG xxxxxxxxxxxxxxxx 

Ictl- ETTTTEEETTTCEEEETTEEEETTTTBTTTT 

SEQ NNKLSLGTYASLHGRIYCKPHFNQLFKSKGNYDEGFGHRPHKDLWASKNENEEILERPAQ 

SEG 

Ictl- TCBCBTTBEEEETTEEEETTTTTTTTTTCCTTTTTTTCTTT 

SEQ LANARETPHSPGVEDAPIAKVGVLAASMEAKASSQQEKEDKPAETKKLRIAWPPPTELGS 

SEG 

Ictl- 

SEQ SGSALEEGIKMSKPKWPPEDEISKPEVPEDVDLDLKKLRRSSSLKERSRPFTVAASFQST 

SEG xxxxxxxxxxxxxxxxxx 

Ictl- 

SEQ SVKSPKTVSPPIRKGWSMSEQSEESVGGRVAERKQVENAKASKKNGNVGKTTWQNKESKG 

SEG 

Ictl- 

SEQ ETGKRSKEGHSLEMENENLVENGADSDEDDNSFLKQQSPQEPKSLNWSSFVDNTFAEEFT 

SEG 

Ictl- 

SEQ TQNQKSQDVELWEGEVVKELSVEEQIKRNRYYDEDEDEE 

SEG xxxxxxx 

Ictl- 



Prosite for DKFZphutel_18il9 , 1 



PS00001 
PS00001 
PS00001 
PS00001 
PS00001 
PSO0001 
PS00004 
PS00004 
PS00004 
PS00004 
PS00005 
PS000O5 
PS00005 
PS00005 
PS00005 
PS00005 
PS000O5 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS000O6 
PS00006 
PS00006 
PS00006 
PS00006 
PS000O6 
PS00006 
PS00006 
PS00006 



29->33 
59->63 

92- >96 
251->255 
286->290 
706->710 

52->56 
65->69 
222->226 
579->583 
15->18 
19->22 
89->92 
158->161 
184->187 
220->223 
248->251 
253->256 
266->269 
525->528 
583->586 
601->604 
604->607 
642->645 
662->665 
19->23 
48->52 
55->59 
85->89 

93- >97 
132->136 
168->172 
230->234 
244->248 
266->270 
294->298 
318->322 
326->330 
337->341 



ASNJ3LYCOSYLATION 

ASN_GL YCOS YLAT I ON 

ASN_GLYCOSYLATION 

ASN_GLYCOS YLAT ION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP PHOSPHO_SITE 

CAMP~PHOSPHO_SITE 

CAMP_PHOSPHO SITE 

CAMP_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOS PHO_S I T E 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PH0SPH02SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO SITE 



PDOC00001 
PDOCO0001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00O05 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC0000S 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC0O006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
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PS00006 


369 


->373 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


389- 


->393 


CK2~ PHOSPHO - 


*SITE 


PDOC00006 


PS00006 


467- 


->471 


CK2~ PHOSPHO - 


SITE 


PDOC00006 


P500006 


514- 


->518 


CK? — PHOSPHO* 


SITE 


PDOC00006 


PS00006 


543 


->547 


CK2~PH0SPH0~ 


SITE 


PDOC00006 


PS00006 


563- 


->567 


CK2~PH0SPH0* 


'site 


PDOC00006 


PS00006 


583* 


->587 


CK2 PHOSPHO* 


'site 


PDOC00006 


C J\J\J UU w 


617 


->62 1 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


658- 


->662 


CK2 PHOSPHO" 


"site 


PDOC00006 


COW \J \J\J O 


686' 


->690 


CK2~PH0SPH0~ 


"site 


PDOC00006 


p^nnnns 

roUUUUO 




->702 


CK2 PHOSPHO" 


"site 


PDOC00Q06 


ocnnnnfi 


/ U j 




CK2 PHOSPHO" 


site 


PDOC00006 


pcnonnfi 




->718 


CK2 PHOSPHO" 


"site 


PDOC00006 


pcnnonfi 

rjuuuuu 


741- 


->74 5 


CK2 PHOSPHO" 


"site 


PDOCOQOOS 


PS00007 


223- 


->230 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


222 


->230 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


239 


->245 


MYRISTYL 


PDOC00008 


PS00008 


427 


->433 


MYRISTYL 




PDOC00008 


PS00008 


502 


->508 


MYRISTYL 




PDOC00008 


PS00008 


539- 


->545 


MYRISTYL 




PDOC00008 


PS00008 


548 


->554 


MYRISTYL 




PDOC00008 


PS00008 


627 


->633 


MYRISTYL 




PDOC00008 


PS00009 


220- 


->224 


AMI DAT I ON 




PDOC00009 


PS00009 


662 


->666 


AMI DAT I ON 




PDOC00009 


PS00478 


390- 


->425 


LIM DOMAIN_l 


PDOC00382 



Pfam for DKFZphutel_18il9. 1 



HMM_NAME 

HMM 

Query 

HMM 
Query 



LIM domain containing proteins 



390 



*CagCNrpIyDREivMRAMNKvWHpECFrCcdCqqPLtegdeFYErDGrI 
C C++++Y+ E++ A+ V+H++CFRC+ C+ L+ G+ + ++ GRI 
CVECQKTVYPMERLL-ANQQVFHISCFRCSYCNNKLSLGT-YASLHGRI 



436 



YCKhDYYrrFg* 
YCK+++ ++F+ 
4 37 YCKPHFNQLFK 



447 
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DKFZphutel_18i4 



group: uterus derived 

DKFZphutel_18i4 encodes a novel 220 amino acid protein without similarity to known proteins.' 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes. 



weak similarity to C.elegans D2085.2 
complete cDNA, complete cds, few EST hits 
Sequenced by AGOWA 
Locus: /map^TqSl" 
Insert length: 1568 bp 

Poly A stretch at pos. 1551, polyadenylation signal at pos. 1523 



1 GCCGAGCGGA GAGGGTAGAG ACGGGGTTTC ACCGTGTTAG CCAAGATGGT 

51 CTCGATCTCC TGACCTCGTG ATCCGCCCGC CTCGGCCTCC CAAAGTGCTG 

101 GGATTACAGG CGTGAGCCAC TGCGCCCGGC CTGTTGTACA GTTATTAAAG 

151 TTATCATTTA ACATGGAAGA AGATGAGTTC ATTGGAGAAA AAACATTCCA 

201 ACGTTATTGT GCAGAATTCA TTAAACATTC ACAACAGATA GGTGATAGTT 

251 GGGAATGGAG ACCATCAAAG GACTGTTCTG ATGGCTACAT GTGCAAAATA 

301 CACTTTCAAA TTAAGAATGG GTCTGTGATG TCACATCTAG GAGCATCTAC 

351 CCATGGACAG ACATGTCTTC CCATGGAGGA GGCTTTCGAG CTACCCTTGG 

401 ATGATTGTGA AGTGATTGAA ACTGCAGCAG CGTCCGAAGT GATTAAATAT 

451 GAGTATCATG TCTTATATTC CTGTAGCTAC CAAGTGCCTG TACTTTACTT 

501 TAGGGCAAGC TTTTTAGATG GGAGACCTTT AACTCTGAAG GACATATGGG 

551 AAGGAGTTCA TGAGTGCTAT AAGATGCGAC TGCTACAGGG ACCATGGGAC 

601 ACTATTACGC AACAGGAACA TCCAATACTT GGGCAACCCT TTTTTGTACT 

651 TCATCCCTGC AAGACGAATG AATTCATGAC TCCTGTATTA AAGAATTCTC 

701 AGAAAATCAA TAAGAATGTC AACTATATCA CATCATGGCT GAGCATTGTA 

751 GGGCCAGTTG TTGGGCTGAA TCTACCTCTG AGTTATGCCA AAGCAACGTC 

801 TCAGGATGAA CGAAATGTCC CTTAACAAGA TTCTTCTATT GAGTTTAGGA 

851 ATTGCGGCAC GAAGAATGCC AAGAGTTTAC CTGGCCAGCC CTGGCTTTAA 

901 TAGGACTGAT ACCATGGAAT ATTTCATCTC ACCAAGATGT GACATGGATT 

951 ATTTTTCCCT TGGACACAAA TGTCTACAGC AACTGATGTT TGATAGGCTG 

1001 AATGTTTAGA AGAAACACTT CAAAGGGATA CATCATGGCC AGGCATGGTG 
1051 GCTCACACCT GTAATCCAAG CACTTTGGGA GGCCAAGGTG GGAGCATCAC 

1101 TTGATCCTGG GAGTTCGAGA CCAGCCTGGG CAACATGGTG AAACCCTGTC 

1151 GGTACAAAAA AATACAAAAA TTTGCCTGTT TATGGTGGTG TGTTCCTGTA 

1201 GTCCCAGCTC CCCAGGAGGC TGAGGTGGGA GGTTGGCTTT AACCCAGGAG 

1251 GCAGAGGTTG CAGTGAGCTG AGACTGTGCC ACTGCAGTCC AGCCTGGGTG 
1301 ACAGAGCCAG ACACTGTCTC GGGAAAAAAA AAAAAAAAAA AAAGACACAT 

1351 CACTATAAAT AGCAAAAAAA CAAATCTAAC TTATTAATAC TAGGAATACC 

1401 AACATTATTA GGGCACTTGC AGGTTATTCT TTTCTAGGCC AAGTACTTCA 
1451 CTTCCATTTG TCTGACATGG AGATTGAGGG AGAAATGTAT TTGTGTGTTC 

1501 ATTTTAATGT AAGATATATA AAAATTAAAT TACTGGATTT ACCTGTCCCT 
1551 GAAAAAAAAA AAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 163 bp to 822 bp; peptide length: 220 
Category: similarity to unknown protein 
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1 MEEDEFIGEK TFQRYCAEFI KHSQQIGDSW EWRPSKDCSD GYMCKIHFQI 
51 KNGSVMSHLG ASTHGQTCLP MEEAFELPLD DCEVIETAAA SEVIKYEYHV 
101 LYSCSYQVPV LYFRASFLDG RPLTLKDIWE GVHECYKMRL LQGPWDTITQ 
151 QEHPILGQPF FVLHPCKTNE FMTPVLKNSQ KINKNVNYIT SWLSIVGPVV 
201 GLNLPLSYAK ATSQDERNVP 



BLASTP hits 



Entry CED2085_2 from database TREMBL: 
"D2085.2"; Caenorhabditis elegans cosmid D2085 
Length = 173 

Score « 167 (58.8 bits), Expect = l.le-12, P = l.le-12 
Identities = 36/121 (29%), Positives = 64/121 (52%) 



Alert BLASTP hits for DKFZphutel_18i4 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_18i4, frame 1 



Report for DKFZphutel_18i4 . 1 



[LENGTH J 220 

[MW] 25278.99 

[pll 5.34 

[HOMOL] TREMBL :CED20 8 5_2 gene: "D2085.2"; Caenorhabditis elegans cosmid D2085 2e-ll 

[BLOCKS) BL00221E 

[PROSITE] MYRISTYL 2 

(PROSITE] CK2 PHOSPHO_SITE 4 

[PROSITE] PKC~PHOSPHO_SITE 2 

[PROSITE] ASN GLYCOSYLATION 1 

(KWJ Alpha_Beta 



SEQ MEEDEFIGEKTFQRYCAEFIKHSQQIGDSWEWRPSKDCSDGYMCKIHFQI KNGSVMSHLG 

PRD cccccccchhhhhhhhhhhhhhhhcccccccccccccccceeeeeeeeeeeccceeeeec 

SEQ ASTHGQTCLPMEEAFELPLDDCEVIETAAASEVIKYEYHVLYSCSYQVPVLYFRASFLDG 

PRD cccccccchhhhhhhhccccceeehhhhhchhhhhhhheeeeccccceeeeeeecccccc 

SEQ RPLTLKDIWEGVHECYKMRLLQGPWDTITQQEHPILGQPFFVLHPCKTNEFMTPVLKNSQ 

PRD cccccchhhhhhhhhhhhhhhhccccccccccccccccceeeeccccccccccccccccc 



SEQ KINKNVNYITSWLSIVGPVVGLNLPLSYAKATSQDERNVP 
PRD ccccccccccccceeeeccccccccceeeecccccccccc 



Prosite for DKFZphutel_18i4 . 1 



PS00001 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 



52- >56 
124->127 
179->182 
116->120 
124->128 
149->153 
212->216 

53- >59 
131->137 



ASN_GLYCOSYLATION 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphutel_18i4 . 1) 
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DKFZphutel_1811 



group: nucleic acid management 

DKFZphtes3_15jl8 encodes a novel 184 amino acid protein with similarity to S. cerevisiae 
putative ribosomal protein YHR148w. 

The novel protein is similar to several 40S ribosomal proteins and therefore seems to part 
the corresponding ribosome subunit. 

The new protein can find application in modulation of ribosome assembly, structure and 
function. 



strong similarity to 5. cerevisiae YHR148w 
complete cDNA, complete cds, EST hits, 

potential start at Bp 45 matchs kozak consensus ANNatgG 
gene disruption of YHR148w is lethal! 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 1076 bp 

Poly A stretch at pos. 1035, polyadenylation signal at pos. 1006 



1 GCGCGCTCTC AGCTTCGGGT CCTGCGGCTG CGGCTGCCGC CATCATGGTG 
51 CGGAAGCTTA AGTTCCACGA GCAGAAGCTG CTGAAGCAGG TGGACTTCCT 
101 GAACTGGGAG GTCACCGACC ACAACCTGCA CGAGCTGCGC GTGCTGCGGC 
151 GTTACCGGCT GCAGCGGCGG GAGGACTACA CGCGCTACAA CCAGCTGAGC 
201 CGTGCCGTGC GTGAGCTGGC GCGGCGCCTG CGCGACCTGC CCGAACGCGA 
251 CCAGTTCCGC GTGCGCGCTT CGGCCGCGCT GCTGGACAAG CTGTATGCTC 
301 TCGGCTTGGT GCCCACGCGC GGTTCGCTGG AGCTCTGCGA CTTCGTCACG 
351 GCCTCGTCCT TCTGCCGCCG CCGCCTCCCC ACCGTGCTCC TCAAGCTGCG 
401 CATGGCGCAG CACCTTCAGG CTGCCGTGGC CTTTGTGGAG CAAGGGCACG 
451 TACGCGTGGG CCCTGACGTG GTTACCGACC CCGCCTTCCT TGTCACGCGC 
501 AGCATGGAGG ACTTTGTCAC TTGGGTGGAC TCGTCCAAGA TCAAGCGGCA 
551 CGTGCTAGAG TACAATGAGG AGCGCGATGA CTTCGATCTG GAAGCCTAGC 
601 GGATCTCCCA CTTTGCATGG CTGTCTTTTA CAGATGGGAA AACTGAGGCC 
651 TGATGCTGGA GATTCTATGA GGGTGCTCTC CTCAAGGGTA TCAGACGGTC 
701 GTAGGTTCTT AAGAATTTGA TTCATCAGTG GCAGGCCATG CATAGAGCCA 
751 CGGGAGGTGC GTCCTTGTTT TCCAGGAAAT GTTCTTAGAA CTTGGACTAC 
801 TGATTATTAA TTGACTGTGC CTTGGGAAAC AGTGGGAAGT AACTTGGTGC 
851 AGCACTGGGG TATTGTTGGA CTGGTTCAAT TCGTTTAACT CGAATTCTTG 
901 CTCCTGGCCG TGGTTAAGCT GTGTACAGAT GATGGAGAGT TTGGCCTCAA 
951 GTTTTTATAA ACTGAGCGAG ACTAGTGTTC AGGATCTCCT CCCTTGTTTA 

1001 AATGTCAATA AATGCCCCAA CTGCTTTGTA AGCTCAAAAA AAAAAAAAAA 

1051 AAAAAAAAAA AAAAAAAAAA AAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



no Medline entry 



Peptide information for frame 3 



ORF from 45 bp to 596 bp; peptide length: 184 
Category: strong similarity to known protein 



1 MVRKLKFHEQ KLLKQVDFLN WEVTDHNLHE LRVLRRYRLQ RREDYTRYNQ 

51 LSRAVRELAR RLRDLPERDQ FRVRASAALL DKLYALGLVP TRGSLELCDF 

101 VTASSFCRRR LPTVLLKLRM AQHLQAAVAF VEQGHVRVGP DVVTDPAFLV 

151 TRSMEDFVTW VDSSKIKRHV LEYNEERDDF DLEA 

BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_1811 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_1811, frame 3 



Report for DKFZphutel_1811 . 3 



(LENGTH J 184 

[MW] 21850.21 

[pi] 9.54 

(HOMOLJ PIR:S33911 probable ribosomal protein YHRl48w - yeast (Saccharomyces 
cerevisiae) 4e-47 

(FUNCAT) 05.01 ribosomal proteins [S. cerevisiae, YHR148wJ 2e-48 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YPL081w] 5e-07 

[FUNCAT] j mrna translation and ribosome biogenesis (M. jannaschii, MJ0190] 8e-05 

[BLOCKS] BL00632 

[PIRKW] cytosol le-07 

[PIRKW] ribosome le-07 

[PIRKW] protein biosynthesis le-07 

[SUPFAM] rat ribosomal protein S9 le-07 

[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 2 

[PROSITE] TYR PHOSPHO_SITE 1 

[PROSITE] PKC~PHOSPHO_SITE 1 

[PFAM] Ribosomal protein S4 

[KW] All_Alpha 

[KW] LOW COMPLEXITY 6.52 % 



SEQ MVRKLKFHEQKLLKQVDFLNWEVTDHNLHELRVLRRYRLQRREDYTRYNQLSRAVRELAR 

SEG xxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ RLRDLPERDQFRVRASAALLDKLYALGLVPTRGSLELCDFVTASSFCRRRLPTVLLKLRM 

SEG 

PRD hhhhhccccchhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ AQHLQAAVAFVEQGHVRVGPDWTDPAFLVTRSMEDFVTWVDSSKIKRHVLEYNEERDDF 

SEG 

PRD hhhhhhhhhhhhhhhccccceeecccceeeeeccccceeeeeccchhhhhhhhhcccccc 

SEQ DLEA 

SEG 

PRD cccc 



Prosite for DKFZphutel_1811 . 3 



PS00005 
PS00006 
PS00006 
PS00007 
PS00008 



163->166 
153->157 
159->163 
41->49 
87->93 



PKC_PHOSPHO_SITE 
CK2_PHOSPHO SITE 
CK2_PH0SPH0"SITE 
TYR_PH0SPHO_SITE 
MYRISTYL 



PDOC00005 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 



Pfam for DKFZphutel_1811 . 3 



HMM_NAME Ribosomal protein S4 

HMM *MSR . YRGPRWKI IRRPGElPWLTnK tklmrkYC . . lRPgQHgWR 

M+R ++ +++K+++++++L W ++++R Y R+++ ++ 

Query 1 MVRKLKFHEQKLLKQVDFLNWEVTDHNLHELRVLRRYRLQRREDYTRYN 4 9 

HMM qRk t Ls KI RRmSQYr I RLQEKQKLRFM YGNI t ERQLRRYvRiaEdKRKl D 

Q + +R +++ + L+E + +R +++++L++++ +++ L 

Query 50 QLSR — AVRELARRLRDLPERDQFRVRASAALLDKLYALGLVP-TRGSLE 96 

HMM YsTGenLMQILEMRLDNIVFRMGMAPTIHHARQLINHRHIRVNdRIVNIP 
++ + ++++RL++++ ++ MA ++A+ +++++H+RV++ +V++P 
Query 97 LCDFVTASSFCRRRLPTVLLKLRMAQHLQAAVAFVEQGHVRVGPDVVTDP 14 6 

HMM SYiCRPNDilSIRDkqrMQsHIkWnieSPegrmRPNHLErNnkkYeGtIN 
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++++++ + +++++W++ S+ ++R+ + Y+ + 

Q uer y 147 AFLVTRS M EDFVTWVDSSK IKRHVLEYNEERD - 178 

HMM rllEReWiplklNElLVVEY* 
+++ + 
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DKFZphutel_19fl9 



group: transmembrane protein 

DKFZphutel_19f 19 encodes a novel 204 amino acid protein with similarity to murine p24 protein. 

Murine p24 is expressed only in brain where it is localized exclusively in neurons. It seems 
to be a neuron-specific membrane protein localised in intracellular organelles of highly 
differentiated neural cells and may play a role in the neural organelle transport system. As 
p24, the novel protein contains 2 transmembrane regions, but it contains not the sequence 
homologous to the microtubule-binding domain of microtubule-associated proteins present in 
p24. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the- expression profile of uterus-specific 
genes and as a new marker for uterine cells. 



similarity to mouse P24 protein ; 
membrane regions : 2 

Summary DKFZphutel_19f 19 encodes a novel 204 amino acid protein, with 
similarity to mouse P24 protein. 



similarity to mouse P24 protein 

complete cDNA, complete cds, EST hits, 
2 TM-domains 

Sequenced by AGOWA 

Locus: /map=14.8 cR from top of Chr20 linkage group 
Insert length: 2042 bp 

Poly A stretch at pos. 1958, polyadenylation signal at pos. 1940 



1 GCAGGCAGAG AGATGAGGAA ACTGAGACCC AGAAAGGTGG AAGCACTTGT 
51 CTAAGGTCAC GCCTCCAGGA AGCAGTGTGT CCACGACTCC AGTCCAAGTG 
101 GTCAGGCTCC AGAGCCCACA GTCCCAGGGG TCCATGATGC CGAGCTGCAA 
151 TCGTTCCTGC AGCTGCAGCC GCGGCCCCAG CGTGGAGGAT GGCAAGTGGT 
201 ATGGGGTCCG CTCCTACCTG CACCTCTTCT ATGAGGACTG TGCAGGCACT 
251 GCTCTCAGCG ACGACCCTGA GGGACCTCCG GTCCTGTGCC CCCGCCGGCC 
301 CTGGCCCTCA CTGTGTTGGA AGATCAGCCT GTCCTCGGGG ACCCTGCTTC 
351 TGCTGCTGGG TGTGGCGGCT CTGACCACTG GCTATGCAGT GCCCCCCAAG 
401 CTGGAGGGCA TCGGTGAGGG TGAGTTCCTG GTGTTGGATC AGCGGGCAGC 
4 51 CGACTACAAC CAGGCCCTGG GCACCTGTCG CCTGGCAGGC ACAGCGCTCT 
501 GTGTGGCAGC TGGAGTTCTG CTCGCCATCT GCCTCTTCTG GGCCATGATA 
551 GGCTGGCTGA GCCAGGACAC CAAGGCAGAG CCCTTGGACC CCGAAGCCGA 
601 CAGCCACGTG GAGGTCTTCG GGGATGAGCC AGAGCAGCAG TTGTCACCCA 
651 TTTTCCGCAA TGCCAGTGGC CAGTCATGGT TCTCGCCACC CGCCAGCCCC 
701 TTTGGGCAAT CTTCTGTGCA GACTATCCAG CCCAAGAGGG ACTCCTGAGC 
751 TGCCCACATG GCCTAAGATG TGGGTCCTGG ATCCTTCCCC CTTCTCACCA 
801 TAACCCCCTC TCAGTGTTTC CCCAACTTCT CCCTTTAGAG CCCAACTCCA 
851 GGTCAAATCT GGAGCTCAAA TCCCAGTGCT CCCTCCCCAG GAGTGGGGCC 
901 CCAACTCTTC CAAGATACCA GCATTCCTCA AGTCCTCCCA AAACTTCCTA 
951 CCCACACCCT CTTCCCAAGG CCCTCAGGGG CAGAAAACAT CTCCTTCAAC 
1001 CCGTCCCCAC TCCTTCCTCT GCATGACCTT GGGCAAACCC TTGCCCTTTC 
1051 AAGCCATCAG CTCCTGCCTC TCTGCCATGA GGGCTTTGGA TCAGATTCCT 
1101 CTTCTCGCCA GGATGAGGAC ACGCACTGCC CTCCATAGAC ACAGATGAAG 
1151 GGGTGGGGGT CATTCAGCTC GAATGGGTCC CAGATGCTCA CTTGGCCTTT 
1201 CCCTGCAGGA TGAGTGAAGA CGTTTGCCTC TCACAGTGTG TCTTCTACCT 
1251 GCATTTTGGC ATCAGAGCCC CCCAGCCCAC CCACCACAGG CAATTACTAG 
1301 CCCTAGTTGA TAGGTGAGGT GGGTGAAGAA GGCTGGAGGT GACATGTCCG 
1351 AGGTCACACA ACAAAGCAGC ATGCAGGAAC TAGAAACACA TCTTCAGCCT 
1401 CCTCCTGGGC CAGCTCTTGT GCTACAGGTG GGGCGGAGCC AGCCCCTCAC 
14 51 CTTCCTGGTT CCCTGAGGGT CCTCAGGGTG GAGGACAGGT TTGGCCCAGA 
1501 AAGACTAGCC AGAGGCCTGA TGGTCCCAGG TGGCTCTGGA TATACTTTGG 
1551 ATATGGATTT AAATGGTCTC TAAGAGCCGG GGGTAGGGGG CAGGAAAAGT 
1601 GGGTTGTCTT TGCCCCTCAA AGTCCACCTA CCTAGAAACC AAGCCCACGG 
1651 TCTTGGCCGT GACCCTGATA ATAAATGGGC TCTCTCAGAG GCGCCAGCCC 
1701 CTCCCTCCCC AGCCGGAGGC GTCATCTCTC TTCTGTACCA CTAGAGGGAG 
1751 CTCTGATGCA GCTGGAGAGC AGCGCTCAAG GCTCTCGCCC CTCCCCTCCC 
1801 TAACCCTTAC CTTCAGTCTC CACCAGCCTG AAGGGCCTCC TAGGGGATCC 
1851 TCAGGCGGCC CCCACCAGGG CACACCCTAC TGTCCTTGTG CCTCACGCCC 
1901 CCTCCTCATC CTGCACCCCT TCCATCCCAC CTTCCCTTTC AATAAACAGC 
1951 TGGGATGGAA AAAAAAAAAA AGAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2001 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AA 
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BLAST Results 



Entry HS417348 from database EMBL: 
human STS WI-14697. 
Length - 290 
Minus Strand HSPs: 

Score - 1254 (188.2 bits), Expect - 3.0e-50, P = 3.0e-50 
Identities = 262/273 (95%) 



Medline entries 



97334404: 

A newly identified membrane protein localized exclusively in 
intracellular organelles of neurons. 



Peptide information for frame 2 



ORF from 134 bp to 745 bp; peptide length: 204 
Category: similarity to known protein 



1 MMPSCNRSCS CSRGPSVEDG KWYGVRSYLH LFYEDCAGTA LSDDPEGPPV 
51 LCPRRPWPSL CWKISLSSGT LLLLLGVAAL TTGYAVPPKL EGIGEGEFLV 
101 LDQRAADYNQ ALGTCRLAGT ALCVAAGVLL AICLFWAMIG WLSQDTKAEP 
151 LDPEADSHVE VFGDEPEQQL SPIFRNASGQ SWFSPPASPF GQSSVQTIQP 
201 KRDS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_19f 19, frame 2 

TREMBL:MMP2000_1 product: M P24 protein"; Mouse mRNA for P24 protein, 
complete cds., N « 1, Score 295, P = 3.8e-26 



>TREMBL:MMP2000_1 product: "P24 protein"; Mouse mRNA for P24 protein, 
complete cds . 

Length =196 

HSPs: 

Score = 295 (44.3 bits), Expect = 3.8e-26, P = 3.8e-26 
Identities = 58/139 (41%), Positives =• 81/139 (58%) 

Query: 2 MPSCNRSCSCSRGPSVEDGKW YGVRSYLHLFYEDCAGTALSDDPEGPPVLCPRRPWP 58 

M SC+ +C R + +G + YGVRSYLH FYEDC + + + P R W 

Sbjct: 1 MTSCSNTCGSRRAQADTEGGYQQRYGVRSYLHQFYEDCTASIWEYEDDFQTQRSPNR-WS 59 

Query: 59 SLCWKISLSSGTLLLLLGVAALTTGYAVPPKLEGIGEGEFLVLDQRAADYNQALGTCRLA 118 

S+ WK+ L SGT+ ++LG+ L G+ VPPK+E GE +F+V+D A YN AL TC+LA 
Sbjct: 60 SVFWKVGLISGTVFVILGLTVLAVGFLVPPKIEAFGEADFMVVDTHAVKYNGALDTCKLA 119 

Query: 119 GTALCVAAGVLLAICLFWAM 138 

G L G +A CL ++ 
Sbjct: 120 GAVLFC IGGTSMAGCLLMSV 139 



Pedant information for DKFZphutel_19f 19, frame 2 



Report for DKFZphutel_19f 19 .2 



[LENGTH] 

[MWJ 

[pi] 

[HOMOL1 

cds. 7e-19 

[PROSITE] 



204 

21983.07 
4.69 

TREMBL:MMP2000_1 product: 
MYRISTYL 4 



*P24 protein* 1 ; Mouse mRNA for P24 protein, complete 
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[PROSITEJ CAMP_PHOSPHO_SITE 1 

(PROSITEJ CK2_PHOSPHO SITE 3 

(PROSITEJ PKC_PHOSPHO~SITE 1 

I PROSITEJ AS N_GL YC OS Y L AT I ON 2 

[KW] TRANSMEMBRANE 2 

[KW] LOW COMPLEXITY 10.29 % 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



MMPSCNRSCSCSRGPSVEDGKWYGVRSYLHLFYEDCAGTALSDDPEGPPVLCPRRPWPSL 

cccccccccccccccccccccceeehhhhhccccccccccccccccccccccccccccce 
MM 

CWKISLSSGTLLLLLGVAALTTGYAVPPKLEGIGEGEFLVLDQRAADYNQALGTCRLAGT 

. . . . xxxxxxxxxxxxxxxxxxxxx 

eeeeeccccceeecccceeeecccccccccccccccceeeecccccccchhhhhhhhchh 
MMMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMM 

ALCVAAGVLLAICLFWAMIGWLSQDTKAEPLDPEADSHVEVFGDEPEQQLSPIFRNASGQ 

hhhhhhhhhhhhhhhhhhhhhhccccccccccccccceeeeccccccccccccccccccc 
MMMMMMMMMMMMMMMMMMMMMM 

SWFSPPASPFGQSSVQTIQPKRDS 

ccccccccccccceeeeccccccc 



Prosite for DKFZphutel_19f 19. 2 



PS00001 


6->10 


PS00001 


176->180 


PS00004 


201->205 


PS00005 


114->117 


PS00006 


16->20 


PS00006 


146->150 


PS00006 


157->161 


PS00008 


38->44 


PS00008 


92->98 


PS00008 


119->125 


PS00008 


127->133 



ASN_GLYCOSYLATION 

A SN_G L YC 0 S YL AT I ON 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PH0SPH02SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 



{No Pfam data available for DKFZphutel_19f 19 . 2 ) 
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DKF2phutel_19gl9 



group: uterus derived 

DKFZphutel_19gl9 encodes a novel 400 amino acid protein, with strong but partial similarity 
a bovine elastin-related protein expressed in fetal calf ligamentura nuchae. 

The novel protein contains 2 RGD cell attachment sites. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes and as a new marker for uterine cells. 



similarity to bovine elastin fragment 
complete cDNA, complete cds, EST hits 
Sequenced by AGOWA 

Locus: map=54.9 cR from top of Chr3 linkage group 
Insert length: 3244 bp 

Poly A stretch at pos . 3227, polyadenylation signal at pos. 3216 



1 GTAACTGCAG TAAGTCCCGC TTGGCCCTGG AGTCCACGCG GATTTTCGAA 
51 GCTGGGGCTG GCAAGAGGCC GCTGGACACC ACGCTCCAGT CGTCAGCCCA 
101 CTTCCTAGCT GAACAGCGCG AGGCGGCGGC AGCGAGCCGG GTCCCACCAT 
151 GGCCGCGAAT TATTCCAGTA CCAGTACCCG GAGAGAACAT GTCAAAGTTA 
201 AAACCAGCTC CCAGCCAGGC TTCCTGGAAC GGCTGAGCGA GACCTCGGGT 
251 GGGATGTTTG TGGGGCTCAT GGCCTTCCTG CTCTCCTTCT ACCTAATTTT 
301 CACCAATGAG GGCCGCGCAT TGAAGACGGC AACCTCATTG GCTGAGGGGC 
351 TCTCGCTTGT GGTGTCTCCT GACAGCATCC ACAGTGTGGC TCCGGAGAAT 
401 GAAGGAAGGC TGGTGCACAT CATTGGCGCC TTACGGACAT CCAAGCTTTT 
451 GTCTGATCCA AACTATGGGG TCCATCTTCC GGCTGTGAAA CTGCGGAGGC 
501 ACGTGGAGAT GTACCAATGG GTAGAAACTG AGGAGTCCAG GGAGTACACC 
551 GAGGATGGGC AGGTGAAGAA GGAGACGAGG TATTCCTACA ACACTGAATG 
601 GAGGTCAGAA ATCATCAACA GCAAAAACTT CGACCGAGAG ATTGGCCACA 
651 ATAACCCCAG TGCCATGGCA GTGGAGTCAT TCACGGCAAC AGCCCCCTTT 
701 GTCCAAATTG GCAGGTTTTT CCTCTCGTCA GGCCTCATCG ACAAAGTCGA 
751 CAACTTCAAG TCCCTGAGCC TATCCAAGCT GGAGGACCCT CATGTGGACA 
801 TCATTCGCCG TGGAGACTTT TTCTACCACA GCGAAAATCC CAAGTATCCA 
851 GAGGTGGGAG ACTTGCGTGT CTCCTTTTCC TATGCTGGAC TGAGCGGCGA 
901 TGACCCTGAC CTGGGCCCAG CTCACGTGGT CACTGTGATT GCCCGGCAGC 
951 GGGGTGACCA GCTAGTCCCA TTCTCCACCA AGTCTGGGGA TACCTTACTG 
1001 CTCCTGCACC ACGGGGACTT CTCAGCAGAG GAGGTGTTTC ATAGAGAACT 
1051 AAGGAGCAAC TCCATGAAGA CCTGGGGCCT GCGGGCAGCT GGCTGGATGG 
1101 CCATGTTCAT GGGCCTCAAC CTTATGACAC GGATCCTCTA CACCTTGGTG 
1151 GACTGGTTTC CTGTTTTCCG AGACCTGGTC AACATTGGCC TGAAAGCCTT 
1201 TGCCTTCTGT GTGGCCACCT CGCTGACCCT GCTGACCGTG GCGGCTGGCT 
1251 GGCTCTTCTA CCGACCCCTG TGGGCCCTCC TCATTGCCGG CCTGGCCCTT 
1301 GTGCCCATCC TTGTTGCTCG GACACGGGTG CCAGCCAAAA AGTTGGAGTG 
1351 AAAAGACCCT GGCACCCGCC CGACACCTGC GTGAGCCCTA GGATCCAGGT 
1401 CCTCTCTCAC CTCTGACCCA GCTCCATGCC AGAGCAGGAG CCCCGGTCAA 
1451 TTTTGGACTC TGCACCCCCT CTCCTCTTCA GGGGCCAGAC TTGGCAGCAT 
1501 GTGCACCAGG TTGGTGTTCA CCAGCTCATG TCTTCCCCAC ATCTCTTCTT 
1551 GCCAGTAAGC AGCTTTGGTG GGCAGCAGCA GCCATGAATG GCAAGCTGAC 
1601 AGCTTCTCCT GCTGTTTCCT TCCTCTCTTG GACTGAGTGG GTACGGCCAG 
1651 CCACTCAGCC CATTGGCAGC TGACAACGCA GACACGCTCT ACGGAGGCCT 
1701 GCTGATAAAG GGCTCAGCCT TGCCGTGTGC TGCTTCTCAT CACTGCACAC 
1751 AAGTGCCATG CTTTGCCACC ACCACCAAGC ACATCTGTGA TCCTGAAGGG 
1801 CGGCCGTTAG TCATTACTGC TGAGTCCTGG GTCACCAGCA GACACACTGG 
1851 GCATGGACCC CTCAAAGCAG GCACACCCAA AACACAAGTC TGTGGCTAGA 
1901 ACCTGATGTG GTGTTTAAAA GAGAAGAAAC ACTGAAGATG TCCTGAGGAG 
1951 AAAAGCTGGA CATATACTGG GCTTCACACT TATCTTATGG CTTGGCAGAA 
2001 TCTTTGTAGT GTGTGGGATC TCTGAAGGCC CTATTTAAGT TTTTCTTCGT 
2051 TACTTTGCTG CTTCATGTGT ACTTTCCTAC CCCAAGAGGA AGTTTTCTGA 
2101 AATAAGATTT AAAAACAAAA CAAAAAAAAC ACTTAATATT TCAGACTGTT 
2151 ACAGGAAACA CCCTTTAGTC TGTCAGTTGA ATTCAGAGCA CTGAAAGGTG 
2201 TTAAATTGGG GTATGTGGTT TGATTGATAA AAAGTTACCT CTCAGTATTT 
2251 TGTGTCACTG AGAAGCTTTA CAATGGATGC TTTTGAAACA AGTATCAGCA 
2301 AAAGGATTTG TTTTCACTCT GGGAGGAGAG GGTGGAGAAA GCACTTGCTT 
2351 TCATCCTCTG GCATCGGAAA CTCCCCTATG CACTTGAAGA TGGTTTAAAA 
2401 GATTAAAGAA ACGATTAAGA GAAAAGGTTG GAAGCTTTAT ACTAAATGGG 
2451 CTCCTTCATG GTGACGCCCC GTCAACCACA ATCAAGAACT GAGGCCTGAG 
2501 GCTGGTTGTA CAATGCCCAC GCCTGCCTGG CTGCTTTCAC CTGGGAGTGC 
2551 TTTCGATGTG GGCACCTGGG CTTCCTAGGG CTGCTTCTGA GTGGTTCTTT 
2601 CACGTGTTGT GTCCATAGCT TTAGTCTTCC TAAATAAGAT CCACCCACAC 
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2651 CTAAGTCACA GAATTTCTAA 
2701 GATAAAGTAT GTTGTAACCA 
2751 TTCTGTCATA TTCAGAAACC 
2801 GCAATTAAGT ATTTGGTAGC 
2851 AGATCTGTAA TCATCTCTAT 
2901 TTCTTCACAT GAAAAACACA 
2951 GATTCAATAG TATTCACTAA 
3001 CTCCAAACTC TTAAAGGATG 
3051 ATGACTAAAT GCAAAATCCT 
3101 TAAATAACTG CCGACTTCAA 
3151 GAAAAATTTG AAAGCTTTGG 
3201 TTATTAAAAG TTTTGTAATA 



GTTCCCCAAC TACTCTCACA CCCTTTTAAA 
GGATGTCTTA AATGATTCTT TGTGTACCTT 
GTTTTGTGCC TGCTGGGAGT AATTCCTTTA 
TGAATAAGGG GTCAGAACTT CTGAAACCAG 
TGGCCTGGGG TGCCTGTGCT ATAAATGAGT 
GCCAGCCCAA GATGACTTAT CTGGGTTTAG 
CTGCTTATTA CATGAGCAAT TTCATCAAAT 
CTTTCGGAAA ACACGCTGTA TACCTAGATG 
TGGGCTTTGG TTTTTTTCTA GTAAGGATTT 
AAGTGTTCTT AAAACGAAAG ATAATGTTAA 
AAAACCAAAT TTGTAATATC ATTGTATTTT 
AATTTCTAAA AAAAAAAAAA AAAA 



BLAST Results 



Entry HS545355 from database EMBL: 
human STS WI-14815. 
Length » 4 36 
Minus Strand HSPs: 

Score = 2040 (306.1 bits), Expect = 6.2e-86, P = 6.2e-86 
Identities * 420/426 (98%) 

Entry HS932147 from database EMBL: 
human STS Wl-8531. 
Length =341 
Minus Strand HSPs: 

Score «- 1705 {255.8 bits), Expect - 4.7e-70, P - 4.7e-70 
Identities = 341/341 (100%) 



Medline entries 



86051793: 

Bovine elastin cDNA clones: evidence for the occurrence of a 
new elastin-related protein in fetal calf ligamentum nuchae. 



Peptide information for frame 2 



ORF from 149 bp to 134 8 bp; peptide length: 400 
Category: similarity to known protein 



1 MAANYSSTST RREHVKVKTS SQPGFLERLS ETSGGMFVGL 
51 FTNEGRALKT ATSLAEGLSL VVSPDSIHSV APENEGRLVH 
101 LSDPNYGVHL PAVKLRRHVE MYQWVETEES REYTEDGQVK 
151 WRSEIINSKN FDREIGHNNP SAMAVESFTA TAPFVQIGRF 
201 DNFKSLSLSK LEDPHVDIIR RGDFFYHSEN PKYPEVGDLR 
251 DDPDLGPAHV VTVIARQRGD QLVPFSTKSG DTLLLLHHGD 
301 LRSNSMKTWG LRAAGWMAMF MGLNLMTRIL YTLVDWFPVF 
351 FAFCVATSLT LLTVAAGWLF YRPLWALLIA GLALVPILVA 

BLAST P hits 

Entry 145887 from database PIR: 
elastin - bovine (fragment) 
Length = 40 

Score - 131 (46.1 bits), Expect - 4.9e-08, P - 4.9e-08 
Identities = 31/41 (75%), Positives = 34/41 (82%) 



Alert BLASTP hits for DKFZphutel_19gl9, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_19gl9, frame 2 



MAFLLSFYLI 
IIGALRTSKL 
KETRYSYNTE 
FLSSGLIDKV 
VSFSYAGLSG 
FSAEEVFHRE 
RDLVNIGLKA 
RTRVPAKKLE 



Report for DKF2phutel_19gl9.2 



[LENGTH J 400 
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[MW) 

[pH 

[HOMOLJ 

[PROSITE] 

[PROSITE J 

t PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 



44831.53 
7.23 

PIR:I45887 elastin 
RGD 2 
MYRISTYL 3 
CAMP_PHOSPHO_SITE 
CK2_PHOSPHO SITE 
TYR_PHOSPHO~SITE 
PKC_PH0SPHO SITE 
ASN GLYCOSYLATION 
TRANSMEMBRANE 4 



bovine (fragment) le-06 



SEQ MAANYSSTSTRREHVKVKTSSQPGFLERLSETSGGMFVGLMAFLLSFYLIFTNEGRALKT 
PRD ccceeecccceeeeeeeecccccceeeecccccccchhhhhhhhhhheeeeecccchhhh 
MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM . . 



SEQ 
PRD 
MEM 



ATSLAEGLSLVVSPDSIHSVAPENEGRLVHIIGALRTSKLLSDPNYGVHLPAVKLRRHVE 
hhhhhccceeeeccccceeeeccccceeeeeeeeeeceeeccccccccccchhhhhhhhh 



SEQ MYQWVETEESREYTEDGQVKKETRYSYNTEWRSEIINSKNFDREIGHNNPSAMAVESFTA 

PRD hheeehhhhheeecccccccceeeccccccceeeeeeccccceeecccccceeeeeeecc 

MEM M 

SEQ TAPFVQIGRFFLSSGLIDKVDNFKSLSLSKLEDPHVDHRRGDFFYHSENPKYPEVGDLR 

PRD ccceeeeeeeeeccccccccccceeeeeeeccccceeeeecccceeecccccccccccee 

MEM MMMMMMMMMMMMMMMMM 



SEQ 
PRD 
MEM 



VSFSYAGLSGDDPDLGPAHVVTVIARQRGDQLVPFSTKSGDTLLLLHHGDFSAEEVFHRE 
eeccccccccccccccceeeeeeeeecccccccccccccceeeeeecccccchhhhhhhh 



SEQ LRSNSMKTWGLRAAGWMAMFMGLNLMTRILYTLVDWFPVFRDLVNIGLKAFAFCVATSLT 

PRD hhccccccccchhhhhhhhhhhchhhhhhhhheeecccccccccccceeeeeeeeehhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMM MMMM 

SEQ LLT V AAGWL F YR P LW ALL I AGLALV P I L V A RT RV P AK KL E 

PRD hhhhhccceeehhhhhhhhhhhhchhhhhhhhcccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMM 



PS00001 




4->8 


ASN 


GLYCOSYLATION 


PS00004 


140- 


>144 


CAMP PHOSPHO SITE 


PS00005 


9 


>->12 


PKC 


PHOSPHO 


SITE 


PS00005 


10 


i->13 


PKC~ 


"PHOSPHO" 


"SITE 


PS00005 


97- 


>100 


PKC* 


"PHOSPHO* 


"site 


PS00005 


276- 


>279 


PKC" 


"PHOSPHO" 


"site 


PS00005 


305- 


>308 


PKC~ 


"PHOSPHO" 


"site 


PS00006 


10 


->14 


CK2" 


"PHOSPHO" 


"site 


PS00006 


63 


->67 


CK2" 


"PHOSPHO" 


"site 


PS00006 


209- 


>213 


CK2" 


*PHOSPHO~ 


"site 


PS00006 


249- 


>253 


CK2" 


"PHOSPHO*" 


"site 


PS00006 


292- 


>296 


CK2" 


"PHOSPHO" 


"site 


PS00006 


332- 


>336 


CK2* 


"PHOSPHO" 


"site 


PS00007 


220- 


>227 


tyr" 


"PHOSPHO" 


"site 


PS00007 


99- 


>107 


tyr" 


"PHOSPHO" 


"site 


PS00008 


35 


->41 


MYRISTYL 




PS00008 


93 


->99 


MYRISTYL 




PS00008 


310- 


>316 


MYRISTYL 




PS00016 


221- 


>224 


RGD 






PS00016 


268- 


>271 


RGD 







Prosite for DKFZphutel_19gl9 . 2 

PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00016 
PDOC00016 



(No Pfam data available for DKFZphutel_19gl9.2) 
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DKFZphutel_19g22 



group: cell structure and motility 

DKFZphutel_19g22 encodes a novel 390 amino acid protein with very strong similarity to 
tuf telin/enamelin . 

Tuftelin/enamelin are matrix proteins of the teeth. As other proteins involved in 
calcification, these proteins are also expressed in the uterus matrix. 

The new protein can find application in modulation of tissue-calcification, especially the 
uterus . 



complete cDNA, complete cds start at Bp 51, EST hits in 3' UTR, 
human homolog of mouse tuftelin 

tuftelin is descriebed as a matrix protein of teeth but it seems also 
to be pressend in the uterus matrix 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 3110 bp 

Poly A stretch at pos. 3093, polyadenylation signal at pos. 3071 



1 GCAGACAGCG GGGTGGACAA GTGGCGTGTG TGCTGCGACC CCGAGGGAAG 
51 ATGAACGGGA CGCGGAACTG GTGTACCCTG GTGGACGTGC ACCCAGAGGA 
101 CCAGGCGGCG GGCAGCGTGG ACATTCTCAG GCTGACTCTC CAGGGTGAAC 
151 TGACAGGAGA TGAACTTGAA CACATAGCCC AGAAGGCGGG CAGGAAGACC 
201 TATGCCATGG TGTCCAGCCA CTCAGCTGGT CATTCTCTGG CTTCAGAACT 
251 GGTGGAGTCC CATGATGGAC ATGAGGAGAT CATTAAGGTG TACTTGAAGG 
. 301 GGAGGTCTGG AGACAAGATG ATTCACGAGA AGAATATTAA CCAGCTGAAG 
351 AGTGAGGTCC AGTACATCCA GGAGGCCAGG AACTGCCTAC AGAAGCTCCG 
401 GGAGGATATA AGTAGCAAGC TTGACAGGAA CCTAGGAGAT TCTCTCCATC 
451 GACAGGAGAT ACAGGTGGTG CTAGAAAAGC CAAATGGCTT TAGTCAGAGT 
501 CCCACAGCCC TGTACAGCAG CCCACCTGAG GTGGACACCT GTATAAATGA 
551 GGATGTTGAG AGCTTGAGGA AGACGGTGCA GGACTTGCTG GCCAAGCTTC 
601 AGGAGGCCAA GCGGCAACAC CAGTCAGACT GTGTGGCTTT TGAGGTCACA 
651 CTCAGCCGGT ACCAGAGGGA AGCAGAACAA AGTAATGTGG CCCTTCAGAG 
701 AGAGGAGGAC AGAGTGGAGC AGAAAGAGGC AGAAGTCGGA GAGCTGCAGA 
751 GGCGCTTGCT AGGGATGGAG ACGGAGCATC AGGCCTTACT GGCGAAAGTG 
801 AGGGAAGGGG AGGTGGCCCT AGAGGAACTT CGGAGCAACA ATGCTGACTG 
851 CCAAGCAGAA CGAGAAAAGG CTGCTACCCT GGAAAAGGAA GTGGCCGGGT 
901 TGCGGGAGAA GATCCACCAC TTGGATGACA TGCTCAAGAG CCAGCAGCGG 
951 AAAGTCCGGC AAATGATAGA GCAGCTCCAG AATTCAAAAG CTGTGATCCA 
1001 GTCAAAGGAC GCCACCATCC AGGAGCTCAA GGAGAAAATC GCCTATCTGG 
1051 AGGCAGAGAA TTTAGAGATG CATGACCGGA TGGAACACCT GATAGAAAAA 
1101 CAAATCAGTC ATGGCAACTT CAGCACCCAG GCCCGGGCCA AGACAGAGAA 
1151 CCCGGGCAGT ATTAGGATAT CCAAGCCGCC TAGCCCGAAG CCCATGCCTG 
1201 TCATCCGAGT GGTGGAAACC TGAGCTGCCT GGAGATGGTT GCTGCCATTG 
1251 CTGCTGCCTC TGCCTCGGAG AAGCCCACTG CCCCTGTTGG CTGTTAACAC 
1301 TGCCTTTGAC TTCCTGACTG TCCCCTGGCT GCACCCAGGA CTTCGGGCTC 
1351 CTGTGTCTCA CCATTCCCAA GCCCCTGGCC ACTCTAAGCT GGGCAGACGG 
1401 AGCACGAGCA CCTATTCAAG GCACTGCAGC CCTTTGGAAG ACATTGTCCT 
1451 GCAAGCAGGA GCCAGGGCAA TATCTATATT CCTACAGTGA CTATTTTTCT 
1501 CTGTAGAGAG CCTCCCTTCT GTTGTAGACT GGACTCTGGC TGCGCCATAA 
1551 GCCAGGCCTT CATCAGATTG GGAGAGGTGA CAAGATTTGC CTCAGCCCTA 
1601 AAAGCTGGAG ACACAGATGT CCAGAGTGAT TGGAGAATGT CCTGGGGGAA 
1651 TGAAGTTCCT TCCACAAACA CAGCTCAGTT CTTAGCAACA AACTGTTTGT 
1701 TTTTCTACTT GCTCCATCTG CAGCCTACGC TGCCCTGGCC TCCTGCAGAC 
1751 AGATAGTGGG GTTACCTGGC AAGGCCTGGT GAGAGCCAGT GAACCTAAGC 
1801 TTTGACTGGG TGGCCTTGTC TTTCTGGGGA GGAGGGAATG TACATTCAGG 
1851 GAGTAGCCTT TTGCGGAAAA ATTCTCTAGG GCTACAGACA GTCATGTGTG 
1901 ACTTCTCTCT GCTGTGAAAA CTCCCAGAGT CTCTTTAGGG ATTTTCCCTA 
1951 AGGTGTACCA CCAGGCACAC CTCAGTCTTC TTGACCCAGA GCCTGAAAAC 
2001 TGTTTTCACT GGGTTCCACC AGTCCCAGCA AAATCCTCTT TGTATTTATT 
2051 TTGCTAAGTT ATTGGTGGTT TTGCTTACAT CTCATGATTG ATATAATACC 
2101 AAAGTTCTAT AGCCTTCTCT TGCAGTATTT GGATTTGCTT GAAACCGGGA 
2151 AAACTGTTCC CATTAGGCTT GTTAATGTCA GAGTGACACT ATTATGAATC 
2201 TTTCTCTCCC TTTCCTCTGC CTGTTTCTTC TCTCTTTCTC CTTCAAACTT 
2251 GCTCTGCAGC TAAGGAAGGT GAGTCTACTT TCCCTGAGGC TTTGGGGTCA 
2301 GAGTATATGT TGTTTGGAGA AAGAGGGCAA TCAGGACTCT' TCTGGGACCC 
2351 AGATGAGTTC TTCACTAGCC CTTCTGAACC CCTTGCTCCA TAATTGGTCT 
2401 TTTATCCTGG CTCTGAATGA CCCTGCAGGT CATCATGGTT TTCTTTTTTT 
24 51 ATTGTTTTTT TTTTTTTCTG AGACAGAGTC TCACTCTGTC ACCCAGGCTG 
2501 GAGTGCAGTG GCGCGATCTC AGCTCACTGC AACCTCTGCC TCCCGGATTT 
2551 AAGCGATTCT TCTGCCTCAG CCTCCCGAGT AGCTGGGACT ACAGGTGTGC 
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2601 CACCACGCCT GGCTGATTTT TGTATTTTTA GTAGAGATGG GGTTTCACCA 

2651 TACTGGCTAG GCTGGTCTCG AATTCCTGAC CTCAGGTGAT CCACCCACCT 

2701 CGGCTTCCCA AAGTGCTAGG ATTATAGGCT TGAGCTACTG TGCCCGGCCC 

2751 ATGGTGTTTT TCTTTAGGGC TCTTCCTACA GCCTTGAGAA GTAGATAGGC 

2801 ATCAGAGTAT GGTACTATAG GAATCAGAAA AATTCAAAAC AAATGTGGAT 

2851 TAAGTGTTTA GGCTCTATGT GGCTCACGCA GCCAGAATCC TTAAGTCTGT 

2901 GTGTTTCTGT GTCTCAAGAC TGGGCTCACA TTCTGGCTTT GTCCATAACA 

2951 ATGCTCTGGG ATTTCAGGGA GTTCCCTCAT TTGTAAAATG AGGGGGTCAG 

3001 AGCAGGTGAT ATCCATGTTT CTTCCCTTTC TGATATTGTT GTCTGTGGCA 

3051 TATTCTTTGT ATGGCGAATT TAATAAATTA TATTAATGTG TCTAAAAAAA 
3101 AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



98200312: 

Tuftelin — aspects of protein and gene structure 
97228909: 

Timing of the expression of enamel gene products during mouse tooth 
development . 

91340750: 

Sequencing of bovine enamelin ("tuftelin") a novel acidic enamel 
protein. 



Peptide information for frame 3 



ORF from 51 bp to 1220 bp; peptide length: 390 
Category: strong similarity to known protein 



1 MNGTRNWCTL VDVHPEDQAA GSVDILRLTL QGELTGDELE HIAQKAGRKT 

51 YAMVSSHSAG HSLASELVES HDGHEEIIKV YLKGRSGDKM IHEKNINQLK 

101 SEVQYIQEAR NCLQKLREDI SSKLDRNLGD SLHRQEIQVV LEKPNGFSQS 

151 PTALYSSPPE VDTCINEDVE SLRKTVQDLL AKLQEAKRQH QSDCVAFEVT 

201 LSRYQREAEQ SNVALQREED RVEQKEAEVG ELQRRLLGME TEHQALLAKV 

251 REG EV ALE EL RSNNADCQAE REKAATLEKE VAGLREKIHH LDDMLKSQQR 

301 KVRQMI EQLQ NSKAVIQSKD ATIQELKEKI AYLEAENLEM HDRMEHLIEK 

351 QISHGNFSTQ ARAKTENPGS IRISKPPSPK PMPVIRVVET 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_19g22, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_19g22, frame 3 



Report for DKFZphutel_19g22 . 3 



(LENGTH] 
[MW] 

[pi] 

[HOMOL] 

cds. 0.0 

[FUNCAT] 

2e-ll 

I FUNCAT] 

[FUNCAT] 

jannaschii, 

[ FUNCAT ] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 



390 

44264 .09 
5.68 

TREMBL:AF047704_1 product: "tuftelin"; Mus musculus tuftelin mRNA, complete 



08.07 vesicular transport (golgi network, etc.) 



[S. cerevisiae, YDL058w] 



30.03 organization of cytoplasm [S. cerevisiae, YDL058w] 2e-ll 

1 genome replication, transcription, recombination and repair [M. 
MJ1643] 7e-ll 

09.13 biogenesis of chromosome structure [S. cerevisiae, YLR086w] le-08 

03.22.01 cell cycle check point proteins [S. cerevisiae, YGL086w] 6e-08 

30.10 nuclear organization [S. cerevisiae, YGL086w] 6e-08 
03.13 meiosis [S. cerevisiae, YNL250w] 7e-08 
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(FUNCAT) 03.19 recombination and dna repair [S, cerevisiae, YNL250w) 7e-08 

[FUNCAT) 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YKR095w] le-07 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YDR285wJ 2e-07 

[FUNCAT] 30.13 organization of chromosome structure [S. cerevisiae, YDR285w] 2e-07 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YOR216c] le-05 

[ FUNCAT ] 01.03.16 polynucleotide degradation [S. cerevisiae, YNL243w] le-04 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YNL243w) 

le-04 

[FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YNL243w] le-04 

[ FUNCAT ] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YNL243w] le-04 

[FUNCAT] 08.19 cellular import [S. cerevisiae, YNL243w] le-04 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YNL243w] le-04 

[FUNCAT] 08.22 cytoskeleton-dependent transport [S. cerevisiae, YHR023w MYOl - 

myosin-1 isoform] 4e-04 

[ FUNCAT J 03.25 cytokinesis [S. cerevisiae, YHR023w MYOl - myosin-1 isoform] 4e-04 

[FUNCAT] 09.10 nuclear biogenesis (S. cerevisiae, YDR356w] 4e-04 

[FUNCAT] 30.05 organization of centrosome [S. cerevisiae, YMR294w] 7e-04 

(EC) 3.6.1.32 Myosin ATPase 8e-09 

[PIRKW] blocked amino end le-07 

[PIRKWJ nucleus le-06 

[PIRKW] citrulline le-07 

[PIRKW] tandem repeat 8e-09 

[PIRKWJ heterodimer 3e-06 

[PIRKW] DNA repair 2e-06 

[PIRKW] heart 8e-09 

[PIRKW] endocytosis 3e-07 

[PIRKW] transmembrane protein 4e-10 

[PIRKW] zinc finger 3e-07 

[PIRKW] metal binding 3e-07 

[PIRKW] muscle contraction 8e-09 

[PIRKW] acetylated amino end le-06 

[PIRKW] actin binding 8e-09 

[PIRKW] microtubule binding le-06 

[PIRKW] cell division control le-06 

[PIRKW] ATP 8e-09 

[PIRKW] chromosomal protein 3e-06 

[PIRKW] thick filament 8e-09 

[PIRKW] phosphoprotein le-145 

[PIRKW] skeletal muscle 8e-09 

[PIRKW] calcium binding le-07 

[PIRKW] meiosis 2e-06 

[PIRKW] alternative splicing 7e-08 

[PIRKW] DNA condensation 3e-06 

[PIRKW] coiled coil 4e-10 

[PIRKW] P-loop 8e-09 

[PIRKW] heptad repeat le-07 

[PIRKW] methylated amino acid 8e-09 

[PIRKW] immunoglobulin receptor 2e-0 6 

[PIRKW] peripheral membrane protein 3e-07 

[PIRKW] cardiac muscle 8e-09 

[PIRKW] hydrolase 8e-09 

[PIRKW] muscle 7e-08 

[PIRKW] EF hand le-07 

[ PIRKW] cytoskeleton 7e-08 

[PIRKW] hair le-07 

[PIRKW] smooth muscle 7e-08 

[PIRKW] calmodulin binding 3e-07 

[SUPFAM] conserved hypothetical P115 protein 2e-09 

[SUPFAM] myosin heavy chain 8e-09 

[SUPFAM] RAD 50 protein 2e-06 

[SUPFAMJ calmodulin repeat homology le-07 

(SUPFAM) myosin motor domain homology 8e-09 

[SUPFAM] alpha-actinin actin-binding domain homology le-06 

[SUPFAM] tropomyosin 7e-08 

(SUPFAM] protein-tyrosine kinase ret 3e-07 

[SUPFAM] plectin le-06 

[SUPFAM] trichohyalin le-07 

[SUPFAM] pleckstrin repeat homology 2e-06 

[SUPFAM] ribosomal protein S10 homology le-06 

[SUPFAM] protein kinase homology 3e-07 

[SUPFAM] protein kinase C zinc-binding repeat homology 2e-06 

[SUPFAM] giantin 4e-06 

[SUPFAM] kinesin-related protein KLPA le-06 

[SUPFAM] kinesin motor domain homology le-06 

[SUPFAM] human early endosome antigen 1 3e-07 

[SUPFAM] M5 protein 2e-06 

[PROSITE] MYRISTYL 1 

[PROSITE] AMI DAT I ON 1 

[PROSITE] CK2_PHOSPHO_SITE 6 
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[PROSITE] PKC_PHOSPHO SITE 4 

[PROSITE] ASN GLYCOSYLATION 2 

(KW] All"Alpha 

[KW] LOW~COMPLEXITY 4 . 62 % 

[KW] COILED COIL 35.13 % 



SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 



MNGTRNWCTLVDVHPEDQAAGSVDILRLTLQGELTGDELEHIAQKAGRKTYAMVSSHSAG 
cccccceeeeeeeccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

HSLASELVESHDGHEEIIKVYLKGRSGDKMIHEKNINQLKSEVQYIQEARNCLQKLREDI 
hhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SSKLDRNLGDSLHRQEIQVVLEKPNGFSQSPTALYSSPPEVDTCINEDVESLRKTVQDLL 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
CCCCCCCCCCCCCCCCCC 

AKLQEAKRQHQSDCVAFEVTLSRYQREAEQSNVALQREEDRVEQKEAEVGELQRRLLGME 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
CCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCC 

TEHQALLAKVREGEVALEELRSNNADCQAEREKAATLEKEVAGLREKIHHLDDMLKSQQR 

hhhhhhhhhhhhhhhhhhhh^^ 

CC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

KVRQMIEQLQNSKAVIQSKDATIQELKEKIAYLEAENLEMHDRMEHLIEKQISHGNFSTQ 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
CCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 



SEQ 
SEG 
PRD 
COILS 



ARAKTENPGSIRISKPPSPKPMPVIRVVET 

XXXXXXXXXXXXXXXXXX . . . 

hhcccccccceeeecccccccccceeeccc 



Prosite for DKFZphutel_19g22 . 3 



PS00001 
PS00001 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00009 



2->6 
356->360 

121- >124 
171->174 
370->373 
378->381 

9->13 
35->39 

122- >126 
157->161 
175->179 
322->326 
355->361 

46->50 



ASN_GL YCOS YLAT I ON 

ASN_GLYCOSYLATION 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOS PHO_S ITE 

PKC PHOSPHO SITE 

CK2~PHOS PHO~S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

MYRISTYL 

AMIDATION 



PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00009 



(No Pfam data available for DKFZphuteI_19g22 . 3) 
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DKFZphutel_19hl7 



group: intracellular transport and trafficking 

DKFZphutel_19hl7 encodes a novel 879 amino acid protein, with similarity to N.crassa osbP 
oxysterol-binding protein. 

The novel protein contains a oxysterol-binding protein family signature. Mammalian oxysterol- 
binding protein (OSBP) is a protein binds a variety of oxysterols (oxygenated derivatives of 
cholesterol) . OSBP seems to play a complex role in the regulation of sterol metabolism. OSBP 
is a cytosolic/Golgi receptor for oxysterols such as 25-hydroxycholesterol, and thus a 
potential target of siphingomyelin turnover and cholesterol mobilization at the plasma 
membrane and/or Golgi apparatus. Therefore, the new protein seems to be involved in oxysterol 
metabolism. 

The new protein can find application in modulating the response of cells to oxysterols. The 
protein can be used as marker for the golgi system. The Protein might be used to direct drugs 
to the golgi system in response to oxidative stess. 



strong similarity to C.elegans ZK1086.1 and oxysterol-binding proteins 

complete cDNA, complete cds, few EST hits 

similarity to proteins involved in steroid biosynthesis 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 3828 bp 

Poly A stretch at pos . 3811, polyadenylation signal at pos. 3784 



1 GCCCGCGCGC CCGGCCGGCC CGGAGCACCG AGCTCGCGGC ACGGTAGGAG 
51 AAGCCCCCGA GCGCCCACAG CATGAAGGAG GAGGCCTTCC TCCGGCGCCG 
101 CTTCTCCCTG TGTCCACCTT CCTCCACCCC TCAGAAAGTC GACCCCCGGA 
151 AGCTCACCCG GAACTTGCTC CTCAGCGGAG ACAATGAGCT CTACCCACTC 
201 AGCCCAGGGA AGGACATGGA GCCCAACGGC CCGTCGCTGC CCAGGGATGA 
251 AGGGCCCCCG ACCCCAAGCT CTGCCACGAA GGTGCCACCG GCAGAGTACA 
301 GGCTGTGCAA CGGGTCAGAC AAGGAATGTG TGTCCCCCAC CGCCAGGGTC 
351 ACCAAGAAGG AGACTCTCAA GGCGCAGAAG GAGAACTACC GGCAGGAGAA 
401 GAAGCGCGCC ACACGGCAGC TGCTCAGCGC TCTGACAGAC CCCAGCGTGG 
451 TCATCATGGC TGACAGCCTG AAGATCCGCG GCACCCTGAA GAGCTGGACC 
501 AAGCTGTGGT GCGTGCTGAA GCCGGGGGTG CTGCTCATCT ACAAGACGCC 
551 CAAGGTGGGC CAGTGGGTGG GCACGGTGCT GCTGCACTGC TGCGAGCTCA 
601 TCGAGCGGCC CTCCAAGAAG GACGGCTTCT GCTTCAAGCT CTTCCACCCG 
651 CTGGATCAGT CCGTCTGGGC CGTGAAGGGC CCCAAAGGTG AGAGCGTGGG 
701 CTCCATCACA CAGCCCCTGC CCAGCAGCTA CCTGATCTTC AGGGCCGCCT 
751 CCGAGTCAGA TGGTCGCTGC TGGCTGGACG CCCTGGAGCT GGCCCTGCGC 
801 TGCTCTAGCC TACTGAGACT GGGCACCTGC AAGCCGGGCC GAGACGGGGA 
851 GCCAGGGACC TCGCCAGACG CATCACCCTC ATCGCTCTGT GGGCTGCCAG 
901 CCTCAGCCAC TGTCCACCCA GACCAAGACC TGTTCCCACT GAACGGGTCT 
951 TCCCTGGAGA ACGATGCATT CTCAGACAAG TCGGAGAGAG AGAACCCTGA 
1001 GGAGTCAGAT ACCGAGACCC AGGACCATAG CCGGAAGACG GAGAGTGGCA 
1051 GCGACCAGTC AGAGACCCCT GGGGCCCCGG TGCGGAGAGG GACCACCTAT 
1101 GTGGAGCAGG TCCAGGAGGA GCTGGGGGAG CTGGGCGAGG CGTCCCAGGT 
1151 GGAGACAGTG TCAGAGGAGA ACAAGAGTCT GATGTGGACC CTGCTGAAGC 
1201 AGCTACGGCC AGGCATGGAC CTGTCCCGCG TGGTGCTACC CACGTTCGTA 
1251 CTGGAGCCGC GCTCCTTCCT GAACAAGCTC TCCGACTACT ACTACCACGC 
1301 AGACCTGCTC TCCAGGGCTG CGGTGGAGGA GGATGCCTAC AGCCGCATGA 
1351 AGCTGGTGCT GCGGTGGTAC CTGTCTGGCT TCTACAAGAA GCCCAAGGGA 
1401 ATCAAGAAGC CGTACAACCC CATCCTGGGG GAGACCTTCC GCTGCTGCTG 
1451 GTTCCACCCG CAGACTGACA GCCGCACATT CTACATAGCA GAGCAGGTGT 
1501 CCCACCACCC GCCCGTGTCT GCCTTCCACG TCAGCAACCG GAAGGACGGC 
1551 TTCTGCATCA GTGGCAGCAT CACAGCCAAG TCCAGGTTTT ATGGGAACTC 
1601 GCTGTCGGCG CTGCTGGACG GCAAAGCCAC GCTCACCTTC CTGAACCGAG 
1651 CCGAGGATTA CACCCTTACC ATGCCCTACG CCCACTGCAA AGGAATCCTG 
1701 TATGGCACGA TGACCCTGGA GCTGGGTGGG AAGGTCACCA TCGAGTGTGC 
1751 GAAGAACAAC TTCCAGGCCC AGCTGGAATT CAAACTCAAG CCCTTCTTCG 
1801 GGGGTAGCAC CAGCATCAAC CAGATCTCGG GAAAGATCAC GTCGGGAGAG 
1851 GAAGTCCTGG CGAGCCTCAG TGGCCACTGG GACAGGGACG TGTTTATCAA 
1901 GGAGGAAGGG AGCGGAAGCA GTGCGCTTTT CTGGACCCCG AGCGGGGAGG 
1951 TCCGCAGACA GAGGCTGAGG CAGCACACGG TGCCGCTGGA GGAGCAGACG 
2001 GAGCTGGAGT CCGAGAGGCT CTGGCAGCAC GTCACCAGGG CCATCAGCAA 
2051 GGGCGACCAG CACAGGGCCA CACAGGAGAA GTTTGCACTG GAGGAGGCAC 
2101 AGCGGCAGCG GGCCCGTGAG CGGCAGGAGA GCCTCATGCC CTGGAAGCCG 
2151 CAGCTGTTCC ACCTGGACCC CATCACCCAG GAGTGGCACT ACCGATACGA 
2201 GGACCACAGC CCCTGGGACC CCCTGAAGGA CATCGCCCAG TTTGAGCAAG 
2251 ACGGGATCCT GCGGACCTTG CAGCAGGAGG CCGTGGCCCG CCAGACCACC 
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2301 TTCCTGGGCA GCCCAGGGCC CAGGCACGAG AGGTCTGGCC CAGACCAGCG 
2351 GCTTCGCAAG GCCAGCGACC AGCCCTCCGG CCACAGCCAG GCCACGGAGA 
2401 GCAGCGGATC CACGCCTGAG TCCTGCCCAG AGCTCTCAGA CGAGGAGCAG 
2451 GATGGTGACT TTGTCCCTGG CGGTGAGAGC CCATGCCCTC GGTGCAGGAA 
2501 GGAGGCGCGG CGGCTGCAGG CCCTGCACGA GGCCATCCTC TCCATCCGAG 
2551 AGGCCCAGCA GGAGCTGCAC AGGCACCTCT CGGCCATGCT GAGCTCCACG 
2601 GCACGGGCAG CACAGGCACC GACCCCAGGC CTCCTGCAGA GCCCCCGATC 
2651 CTGGTTCCTG CTCTGCGTGT TCCTGGCGTG TCAGCTGTTC ATTAACCACA 
2701 TCCTCAAATA GGAGCCCTGG GGGCAGAGCT CCTGGCCAGT CCCGAGCCCT 
2751 CCCTCCCAGG CACCCAGCAC TTTAAGCCTG CTCCATGGAG GCAGAGAGGC 
2801 CCGGCAAGCA CAGCCACTGT GACGGGGAGT CCAGGCGCAG GAGGGACCCG 
2851 GGGCCACAAG GCGCTGCGGG CCCAGGTGTG CTGGGCCCCT CTCAGGGGCA 
2901 CTGGCCTCTC TGCAGGGCCT TCCGCCCAGC GCTGGCCTTA ATGCTAAAGC 
2951 CAAATGCAGC TTCTGCTGTG CGACGCACTC CTGGCCATCT TGCCGTGTCA 
3001 CCCCCTGTCC GGCCTCCACT TGCCATGGGG GATGGATGGA TTTAGGGTGG 
3051 GAGGGCCTGT GGGGGCCCTG GACAGTCACA CCCCAGCAGC AGTGAGTGGG 
3101 CAGGTTTGGA GGAGCAGCCA GGGAGCCCCG AGTGGCCCAG GAGTCCCCCC 
3151 ACACACAGAT GCATAGGCCT GCCTTCCGGA GACCCTGTCC ACATTGCCGG 
3201 GACCACCCTG GTGGGGCCAC TGGTGGGTGC CAGGGACAGG TTAGGGCCAC 
3251 TCTGGGGAAG GCATTTTGGT TTTTTATTCC ACGCTCTGCT GTTTGGATGG 
3301 GAGCCCCACA GAGGCAGGTC CTGGAACCAC CCCACCCCCA CACCTGGACG 
3351 CTCGCTCTGG TGGGGGCACA CGCAGGTGGA GGTGGTTGTG GGTGCAGGTG 
3401 TGTGCAGGGG TGTGGGGGGC GCAGGGGTGT GGCTTAGCTG GCCCCGCACC 
3451 CAGGCCGGGG AGGCTCAAGT TCGCCACTTT ACTCAGACCG ATGCACAGTC 
3501 TTCCCATTTT ACACTTTTTT AATAAACATA ATTGCAATAT TTTAGGTGGG 
3551 CTGCGAGCTG CAGTCAGCCT TCACGTCTGG CCTCAGTCCC CGTGTCAGTG 
3601 CCGCTCTGCG TGTGCGTGTG CGCGTGTGTG AGCCTCTACA CATATATATA 
3651 TGTACAGAGC CTTAAACCAC ATCGTGGCGG TGCCGTCTGA GCTGTAGCGG 
3701 GTGGCTTTGT TTCCAGTTTT TGTACCCGTG TCCTTGTCTC CCCTCCTCCC 
3751 CCATCTGGGG ATGTGTCTGT GTTCCACACC TTGAAATAAA CAGACACATA 
3801 CGTGTTCTCT TAAAAAAAAA AAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



98315477: 

The pleckstrin homology domain of oxysterol-binding 
protein recognises a determinant specific to Golgi 
membranes . 

98146266: 

A Drosophila homologue of oxysterol binding protein 
(OSBP) — implications for the role of 
OSBP. 

98146266: 

A Drosophila homologue of oxysterol binding protein 
(OSBP) — implications for the role of 
OSBP. 



Peptide information for frame 3 



ORF from 72 bp to 2708 bp; peptide length: 879 
Category: strong similarity to known protein 



1 MKEEAFLRRR FSLCPPSSTP QKVDPRKLTR NLLLSGDNEL YPLSPGKDME 

51 PNGPSLPRDE GPPTPSSATK VPPAEYRLCN GSDKECVSPT ARVTKKETLK 

101 AQKENYRQEK KRATRQLLSA LTDPSWIMA DSLKIRGTLK SWTKLWCVLK 

151 PGVLLIYKTP KVGQWVGTVL LHCCELIERP SKKDGFCFKL FHPLDQSVWA 

201 VKGPKGESVG SITQPLPSSY LIFRAASESD GRCWLDALEL ALRCSSLLRL 

251 GTCKPGRDGE PGTSPDASPS SLCGLPASAT VHPDQDLFPL NGSSLENDAF 

301 SDKSERENPE ESDTETQDHS RKTESGSDQS ETPGAPVRRG TTYVEQVQEE 

351 LGELGEASQV ETVSEENKSL MWTLLKQLRP GMDLSRVVLP TFVLEPRSFL 

401 NKLSDYYYHA DLLSRAAVEE DAYSRMKLVL RWYLSGFYKK PKGIKKPYNP 

451 ILGETFRCCW FHPQTDSRTF YIAEQVSHHP PVSAFHVSNR KDGFCISGSI 

501 TAKSRFYGNS LSALLDGKAT LTFLNRAEDY TLTMPYAHCK GILYGTMTLE 

551 LGGKVTIECA KNNFQAQLEF KLKPFFGGST SINQISGKIT SGEEVLASLS 

601 GHWDRDVFIK EEGSGSSALF WTPSGEVRRQ RLRQHTVPLE EQTELESERL 
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651 WQHVTRAISK GDQHRATQEK FALEEAQRQR ARERQESLMP WKPQLFHLDP 
701 ITQEWHYRYE DHSPWDPLKD IAQFEQDGIL RTLQQEAVAR QTTFLGSPGP 
751 RHERSGPDQR LRKASDQPSG HSQATESSGS TPESCPELSD EEQDGDFVPG 
801 GESPCPRCRK EARRLQALHE AILSIREAQQ ELHRHLSAML SSTARAAQAP 
851 TPGLLQSPRS WFLLCVFLAC QLFINHILK 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_19hl7 , frame 3 

TREMBL :CEZK1 08 6_2 gene: "ZK1086.1"; Caenorhabditis elegans cosmid 
ZK1086, N = 1, Score = 1495, P = 2.7e-153 

PIR:S25324 hypothetical protein YKR003w - yeast (Saccharomyces 
cerevisiae), N * 2, Score - 574, P => 8.5e-57 

TREMBL : CEAF195 7 gene: "C32F10.1"; Caenorhabditis elegans cosmid 
C32F10., N = l7 Score = 588, P - 8.6e-57 

PIR:S4 6796 hypothetical protein YKR003w homolog YHROOlw - yeast 
{Saccharomyces cerevisiae), N = 1, Score ■ 585, P = 1.9e-56 

TREMBL :NCOSBP_l gene: "osbP"? product: "oxysterol-binding protein"; 
N.crassa mRNA for putative oxysterol-binding protein, N - 1, Score ■ 
571, P = 7e-55 

TREMBL: ABO 17 02 6_1 product: "oxysterol-binding protein"; Mus musculus 
mRNA for oxysterol-binding protein, complete cds., N = 2, Score = 328, 
P = 3e-35 



>TREMBL:CEZK1086_2 gene: "ZK1086.1"; Caenorhabditis elegans cosmid ZK1086 
Length = 751 

HSPs: 

Score = 1495 (224.3 bits), Expect = 2.7e-153, P = 2.7e-153 
Identities « 327/663 (49%), Positives = 430/663 (64%) 

Query: 129 MADSLKIRGTLKSWTKLWCVLKPGVLLIYKTPKV— GQWVGTVLLHCCELIERPSKKDGF 186 

MAD+LKIRG LK W + +CVLKPG+L++YK K G WVGTVLL+ CELIERPSKKDGF 
Sbjct: 1 MADTLKIRGALKRWNRYYCVLKPGLLILYKHKKADRGDWVGTVLLNHCELIERPSKKDGF 60 

Query: 187 CFKLFHPLDQSVWAVKGPKGESVGSIT-QPLPSSYLIFRAASESDGRCWLDALELALRCS 245 

CFKLFHP+D S+W +GP G+S GS T PL +S+LI RA S+ GRCW+DALEL+ +C+ 
Sbjct: 61 CFKLFHPMDMSIWGNRGPLGQSFGSFTLNPLNTSFLICRAPSDQAGRCWMDALELSFKCT 120 

Query: 24 6 SLLRLGTCKPGRDGEPGTSPDASPSSLCGLPASATVHPDQDLFPLNGSSLENDAFSDK-S 304 

LL+ T D + G D+S +G ++DD G AS+ + 

Sbjct: 121 GLLKK-TMNE-LDDKNG DSSMND— GQRDESRMSRDSD GDDTRELAVSETDA 168 

Query: 305 ERENPEESDTETQDHSRKTESGSDQSETPGAPVRRGTT YVEQVQEELGELGEASQVE 361 

E+ E D + +DH E G SET +R T ++ +E G G S E 
Sbjct: 169 EKHFQEIDDVQDEDH EDGK-MSETSDT- 1 REAFTESAWI PSPKEVFGPDG- -SLTE 220 

Query: 362 TVSEENKSLMWTLLKQLRPGMDLSRVVLPTFVLEPRSFLNKLSDYYYHADLLSRAAVEED 421 

V EENKSL+WTLLKQ+RPGMDLS+VVLPTF+LEPRSFL KL+DYYYHADL+S A E D 
Sbjct: 221 EVGEENKSLIWTLLKQIRPGMDLSKWLPTFILEPRSFLEKLADYYYHADLISEAVAEPD 280 

Query: 422 AYSRMKLVLRWYLSGFYKKPKGIKKPYNPILGETFRCCWFHPQTDSRTFYIAEQVSHHPP 481 

+ R+ V +++LSGFYKKPKG+KKPYNPILGETFRC W HP S TFY+AEQVSHHPP 
Sbjct: 281 PFQRIVKVTKFFLSGFYKKPKGLKKPYNPILGETFRCKWEHPD-GSTTFYMAEQVSHHPP 339 

Query: 482 VSAFHVSNRKDGFCISGSITAKSRFYGNSLSALLDGKATLTFLNRAEDYTLTMPYAHCKG 541 

VS+ ++NRK GF ISG+I AKS++YGNSLSA+L GK LT LN E Y + +PYA+CKG 
Sbjct: 340 VSSLFITNRKAGFNISGTILAKSKYYGNSLSAILAGKLRLTLLNLGETYIVNLPYANCKG 399 

Query: 542 ILYGTMTLELGGKVTIECAKNNFQAQLEFKLKPFFGGSTSINQISGKITSGEEVLASLSG 601 

1+ GTMT+ELGG+V I EC K ++ L+FKLKP GG+ NQI G I G + LAS+ G 
Sbjct: 400 IMIGTMTMELGGEVNIECEKTGYRTTLDFKLKPMLGGA--YNQIEGSIKYGSDRLASIEG 457 

Query: 602 HWDRDVFIKEEGSGSSALFWTPSGEVRRQRLRQHTVPLEEQTELESERLWQHVTRAISKG 661 

WD + IK G W P+ EV + RL ++ + ++EQ E ES +LW+HVT AIS 

Sbjct: 458 AWDGVIRIK — GPDGKKELWNPTPEVIKTRLPRYEINMDEQGEWESAKLWRHVTEAISNE 515 

Query: 662 DQHRATQEKFALEEAQRQRARERQESLMPWKPQLFHLDPITQEWHYRYEDHSPWDPLKDI 721 

DQ++AT+EK ALE QR RA+ S +P + + F ++ Y + D+ PWD DI 

Sbjct: 516 DQYKATEEKTALENDQRARAK SGIPHETKFFKKQH-GDDYVYIHADYRPWDNNNDI 570 
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Query: 722 AQFEQDGILRTLQQEAVAR — QTTFLGSPGPRHERSGPDQRLRKASDQPSGHSQATESSG 779 

Q E + +++T+ + + + + LGS E S D+ + +P + + 

SbjCt: 571 QQIENNYVVKTISRHSKRKTGNSEQLGSDNTS-EASESDEEVI EPKIKKKEIVPAK 625 

Query: 780 STPESCPELSDE 791 

S P + PE++DE 
Sbjct: 626 SKPIT-PEVADE 636 



Pedant information for DKFZphutel_19hl7, frame 3 



Report for DKFZphutel_19hl7 .3 



[LENGTH] 
[MW] 

Ipl] 
[HOMOL] 

[FUN CAT] 

[FUN CAT] 

3e-55 

[FUNCAT] 

[FUNCAT] 

3e-23 

[FUNCAT] 

(BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS) 

[PIRKW] 

(SUPFAM) 

[SUPFAM] 

[SUPFAM] 

[PROSITE) 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAM] 

[KW] 

[KW] 

[KW] 



879 

98616.79 
7.29 

TREMBL :CEZK1086 



2 gene: "ZK1086 . 1"; Caenorhabditis elegans cosmid ZK1086 le-157 



01.06.16 lipid and fatty-acid binding [S. cerevisiae, YHROOlw] 3e-55 

01.06.01 lipid, fatty-acid and sterol biosynthesis [S. cerevisiae, YHROOlw] 



30.03 organization of cytoplasm (S. 
08.07 vesicular transport (golgi network, 



cerevisiae 
etc.) 



YPL145C] 3e-23 
[S. cerevisiae, YPL145c] 



family proteins 
family proteins 
family proteins 



le-19 



04.05.01.07 chromatin modification 
BL00168F 

BL01013D Oxysterol-binding protein family proteins 
BL01013C Oxysterol-binding protein 
BL01013B Oxysterol-binding protein 
BL01013A Oxysterol-binding protein 
transmembrane protein le-19 
pleckstrin repeat homology 8e-18 
ankyrin repeat homology le-19 
unassigned ankyrin repeat proteins 
MYRISTYL 12 
CAMP_PHOS PHO_S ITE 6 
OSBP 1 

CK2_PHOSPHO_SITE 21 
PROKAR_LIPOPROTEIN 1 
TYR_PHOSPHO_SITE 2 
PKC_PHOSPHO_SITE 20 
ASN_GLYCOSYLATI0N 3 
PH (pleckstrin homology) domain 
TRANSMEMBRANE 1 
LOW_COMPLEXITY 2.96 % 

COILED COIL 3.53 % 



[S. cerevisiae, YAR044w] 5e-20 



SEQ MKEEAFLRRRFSLCPPSSTPQKVDPRKLTRNLLLSGDNELYPLSPGKDMEPNGPSLPRDE 

SEG 

PRD ccchhhhhhhhccccccccccccccccccccccccccccccccccccccccccccccccc 

COILS ! 

MEM 

SEQ GPPTPSSATKVPPAEYRLCNGSDKECVSPTARVTKKETLKAQKENYRQEKKRATRQLLSA 

SEG 

PRD cccccccccccccceeeecccccceeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ LTDPSVVIMADSLKIRGTLKSWTKLWCVLKPGVLLIYKTPKVGQWVGTVLLHCCELIERP 

SEG 

PRD hcccceeeecccccccccccccceeeeeeccceeeeecccccccceeeeecccccccccc 

COILS CCC 

MEM 

SEQ SKKDGFCFKLFHPLDQSVWAVKGPKGESVGSITQPLPSSYLIFRAASESDGRCWLDALEL 

SEG 

PRD ccccceeeeecccccceeeeecccccceeecccccccceeeeeeehhhhhhhhhhhhhhh 

COILS 

MEM 

SEQ ALRCSSLLRLGTCKPGRDGEPGTSPDASPSSLCGLPASATVHPDQDLFPLNGSSLENDAF 

SEG 

PRD hhhhhhhhhhhhcccccccccccccccccccccccccccccccccccccccccccccccc 

COILS 

MEM 

SEQ SDKSERENPEESDTETQDHSRKTESGSDQSETPGAPVRRGTTYVEQVQEELGELGEASQV 
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SEG xxxxxxxxxxxxx. . . . 

PRD cccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhccccccc 

COILS 

MEM 

SEQ ETVSEENKSLMWTLLKQLRPGMDLSRVVLPTFVLEPRSFLNKLSDYYYHADLLSRAAVEE 

SEG 

PRD cccccccchhhhhhhhhhcccccceeeccceeeecccchhhhhhhhhccccccccccccc 

COILS 

MEM 

SEQ DAYSRMKLVLRWYLSGFYKKPKGIKKPYNPILGETFRCCWFHPQTDSRTFYIAEQVSHHP 

SEG 

PRD chhhhhhhhhhhhhhhcccccccccccccccccceeeeeecccccccceeeeeccccccc 

COILS 

MEM 

SEQ PVSAFHVSNRKDGFCISGSITAKSRFYGKSLSALLDGKATLTFLNRAEDYTLTMPYAHCK 

SEG 

PRD cceeeeecccccccccccccccccccccccccccccceeeeeeccccceeeeccccceee 

COILS 

MEM 



SEQ GILYGTMTLELGGKVTIECAKNNFQAQLEFKLKPFFGGSTSINQISGKITSGEEVLASLS 

SEG 

PRD eeeeeccccccccceeeeeccccccceeeecccccccccccceeeeeccccccceeeeec 

COILS 

MEM 

SEQ GHWDRDVFIKEEGSGSSALFWTPSGEVRRQRLRQHTVPLEEQTELESERLWQHVTRAISK 

SEG 

PRD cccccceeeeeccccceeeeeccccccccccccccccccchhhhhhhhhhhhhhhhhhhh 

COILS 

MEM 



SEQ GDQHRATQEKFALEEAQRQRARERQESLMPWKPQLFHLDPITQEWHYRYEDHSPWDPLKD 

SEG xxxxxxxxxxxxx 

PRD cchhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccceeeeccccccccchh 

COILS 

MEM 



SEQ IAQFEQDGILRTLQQEAVARQTTFLGSPGPRHERSGPDQRLRKASDQPSGHSQATESSGS 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhccccccccccccchhhhhcccccccccccccccccc 

COILS 

MEM 



SEQ TPESCPELSDEEQDGDFVPGGESPCPRCRKEARRLQALHEAILSIREAQQELHRHLSAML 

SEG 

PRD ccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

MEM 



SEQ SSTARAAQAPTPGLLQSPRSWFLLCVFLACQLFINHILK 

SEG 

PRD hhhhhhhcccccccccccceeeeehhhhhhhhhhhhccc 

COILS 

MEM MMMMMMMMMMMMMMMMM . 



Prosite for DKFZphutel_19hl7 . 3 



PS00001 


80->B4 


PS00001 


291->295 


PS00001 


367->371 


PS00004 


9->13 


PS00004 


26->30 


PS00004 


95->99 


PS00004 


111->H5 


PS00004 


338->342 


PS0OO04 


762->766 


PS00005 


82->85 


PS00005 


90->93 


PS00005 


94->97 


PS00005 


98->101 


PS00005 


132->135 


PS00005 


13B->141 


PS00005 


159->162 


PS00005 


181->184 


PS00005 


252->255 



ASN GLYCOS YLAT I ON 
A S N~G LYCOS YL AT I ON 
ASN_GLYCOSYLATION 
CAMP_PHOS PHO_S ITE 
CAMP_PHOSPHO_SITE 
CAMP_PHOSPHO_SITE 
CAMP_PHOS PHO_S ITE 
CAMP_PHOS PHO_S ITE 
CAMP_PHOSPHO SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO SITE 
PKC PHOSPHO~SITE 
PKC~PHOSPHO~SITE 

pkc~phospho_site 

PKC_PHOSPHO_SITE 
PKC PHOSPHO_SITE 
PKC~PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC0O005 
PDOC00005 
PDOC00005 
PDOC00005 
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PS00005 


301 


->304 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


304 


->307 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


320 


->323 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


455 


->458 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


488 


->491 


PKC PHOSPHO" 


"site 


PDOC00005 


PS 00 005 


501 


->504 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


586 


->589 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


647 


->650 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


824 


->827 


PKC"*PH0SPH0 SITE 


PDOC00005 


PS00005 


843 


->846 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


857 


->860 


PKC PH0SPH0~SITE 


PDOC00005 


PS00006 


82->86 


CK2 PHOSPHO 


site 


PDOC00006 


PS00006 


94->98 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


l fli 


->185 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


227 


->231 


CK7 PHOSPHO 


site 


PDOC00006 


PS00006 


i. DJ 


->267 


CK?~ PHO^ PHO" 


"site 


PDOC00006 


PS00006 


9QT 


'47 ' 


rvO PUn^PHfT 
v-i\£ rnuornu 


"site 


PDOC00006 


PS00006 


JU4 


.\infi 


f*K?""pwr>q pho" 


"site 


PDOC00006 


PS00006 




"?J1D 


rnuornu 


"site 


PDOC00006 


p«;oooo6 

W W W W V V 


IOC 


_ -v -J O Q 




SITE 


PDOC00006 




342 


->34 6 


fnUornU 


SITE 


PDOC00006 


PS00006 


ICO 




fVO DtinCDUO 

rnUbrKU 


SITE 


PDOC00006 


pqnooofi 

V v V w w 


Jot 


-> J DO 


CK2 PHOSPHO" 


"site 


PDOC00006 


tO wv vV V 


590' 


->594 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


OH J' 




CK2~PH0SPH0~ 


"site 


PDOC00006 


PS00006 


659 


->663 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


713 


->717 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


755 


->759 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


780 


->7B4 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


784 


->788 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


789 


->793 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


824 


->828 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


402 


->409 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


415 


->424 


TYR PHOSPHO" 


"site 


PDOC00007 


PSO0OD8 


137 


->143 


MYRISTYL 




PDOC00008 


pcnflOOR 


163 


->169 


MYRISTYL 




PDOC00008 


pcnnoos 

rouu w w o 


274 


->280 


MYRISTYL 




PDOC00008 


PS00008 


326 


->332 


MYRISTYL 




PDOC00008 


PS00008 


381 


->387 


MYRISTYL 




POOC00008 


PS00008 


498 


->504 


MYRISTYL 




PDOC00008 


PS00008 


508 


->514 


MYRISTYL 




PDOC00008 


PS00008 


541 


->547 


MYRISTYL 




PDOC00008 


PS00008 


552 


->558 


MYRISTYL 




PDOC00008 


PS00008 


577 


->583 


MYRISTYL 




PDOC00008 


PS00008 


613 


->619 


MYRISTYL 




PDOC00008 


PS00008 


728 


->734 


MYRISTYL 




PDOC00008 


PS00013 


860 


->871 


PROKAR LIPOPROTEIN 


PDOC00013 


PS01013 


474 


->485 


OSBP 




PDOC00774 



Pfam for DKFZphutel_19hl7 . 3 



HMM_NAME PH (pleckstrin homology) domain 

HMM *dvIREGWMyKWgswrkstgnWqrRWFvLrndpnrLiYYkddkdekPrYM 
+VI+ +++++G + W + W+VL++ ++L+ YK + + + ++ 

Query 126 WIMADSLKIRGTLKS WTKLWC V LK P--GVLLIYKTP - K VGQWVG 167 

HMM lldldcWrMidVEidWmmdndHCFilWtrq 

L+C+ +1+ ++ ++ +CF+++ + 
Query 168 TVLLHCCELIERPSKKD GFCFKLFHPLDQSVWAVKGPKGESVGSITQ 214 

HMM rtYYFQAeNeEEMmeWMsalrRalw* 

+ ++F+A++E++ + W++A++ A+ + 
Query 215 PLPSS YLI FRAASESDGRCWLDALELALR 243 
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DKFZphutel_19jll 



group: uterus derived 

DKFZphutel_19jll encodes a novel 708 amino acid protein with C-terminal similarity to several 
known proteins, such as human KIAA0231 or murine ras binding protein Sur8. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 



Strong similarity to KIAA0231, similarity to ras binding protein Sur8 

EST AA854189 extendes the sequence (294 Bp), with this sequence 
complete cDNA, 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2343 bp 

Poly A stretch at pos. 2323, polyadenylation signal at pos. 2295 



1 GCTCCTGCTA ACCCCATCAC TGTGGAAATG AAAGGCCTGA AGACAGATTT 
51 GGACCTTCAG CAGTACAGCT TTATAAATCA GATGTGTTAT GAGCGAGCCC 
101 TCCACTGGTA TGCCAAGTAT TTCCCTTACC TTGTCCTCAT CCATACCCTG 
151 GTCTTTATGC TCTGCAGTAA CTTTTGGTTC AAATTCCCTG GTTCCAGCTC 
201 CAAAATAGAA CATTTCATCT CCATTCTGGG GAAGTGTTTT GACTCTCCTT 
251 GGACCACACG GGCTTTATCT GAAGTGTCTG GGGAGGACTC AGAAGAAAAG 
301 GACAACAGGA AGAACAACAT GAACAGGTCC AACACCATCC AATCTGGTCC 
351 AGAAGGCAGC CTGGTCAACT CTCAGTCTTT AAAGTCCATT CCTGAGAAGT 
401 TTGTAGTTGA TAAATCCACT GCAGGGGCTC TGGATAAAAA GGAAGGTGAG 
451 CAGGCTAAGG CCTTATTTGA GAAGGTGAAG AAGTTCAGGC TGCATGTGGA 
501 AGAAGGTGAT ATTCTATATG CCATGTATGT TCGCCAGACT GTACTTAAAG 
551 TTATCAAATT CCTAATCATC ATTGCATATA ATAGTGCTCT GGTTTCCAAG 
601 GTCCAGTTTA CAGTGGACTG TAATGTGGAC ATTCAGGACA TGACTGGATA 
651 TAAAAACTTT TCTTGCAATC ATACCATGGC ACACTTGTTC TCAAAACTGT 
701 CCTTTTGCTA TCTGTGCTTT GTTAGTATCT ATGGATTGAC GTGCCTTTAT 
751 ACCTTATACT GGCTGTTCTA CCGTTCTCTA CGGGAATATT CCTTTGAGTA 
801 TGTCCGTCAG GAGACTGGAA TTGATGATAT TCCAGATGTG AAAAATGACT 
851 TTGCTTTTAT GCTTCATATG ATAGATCAGT ATGACCCTCT CTATTCCAAG 
901 AGATTTGCAG TGTTCCTGTC TGAAGTCAGT GAAAACAAAT TAAAGCAGCT 
951 GAACTTAAAT AACGAATGGA CTCCTGATAA ACTGAGGCAG AAGCTACAGA 
1001 CAAATGCCCA TAATCGACTG GAATTGCCTC TTATCATGCT CTCTGGCCTT 
1051 CCAGACACTG TTTTTGAAAT CACAGAGTTG CAATCTCTAA AACTTGAAAT 
1101 CATTAAGAAC GTAATGATAC CAGCCACCAT TGCACAGCTA GACAATCTTC 
1151 AAGAGCTCTC TCTGCACCAG TGTTCTGTCA AAATCCACAG TGCGGCGCTC 
1201 TCTTTCCTGA AGGAAAACCT CAAGGTCTTG AGCGTCAAGT TTGATGACAT 
1251 GAGGGAACTC CCCCCCTGGA TGTATGGGCT CCGAAATCTG GAAGAGCTGT 
1301 ACCTAGTTGG CTCTCTAAGT CATGATATTT CCAGAAATGT CACCCTTGAG 
1351 TCTCTGCGGG ATCTCAAAAG CCTTAAAATT CTCTCTATCA AAAGCAACGT 
1401 TTCCAAAATC CCTCAGGCAG TGGTTGATGT TTCCAGCCAT CTCCAGAAGA 
14 51 TGTGCATACA TAATGATGGC ACCAAGCTGG TGATGCTCAA CAACTTAAAG 
1501 AAGATGACCA ATCTGACAGA GCTGGAGCTG GTCCACTGTG ACCTGGAGCG 
1551 TATTCCTCAT GCTGTGTTCA GCCTACTCAG CCTCCAGGAA TTGGACCTGA 
1601 AGGAAAACAA TCTGAAATCT ATAGAAGAAA TCGTTAGCTT TCAGCACTTA 
1651 AGAAAGTTGA CAGTGCTAAA ACTGTGGCAT AACAGCATCA CCTACATCCC 
1701 AGAGCATATA AAGAAACTCA CCAGCCTGGA ACGCCTGTCC TTTAGTCACA 
1751 ATAAAATAGA GGTGCTGCCT TCCCACCTCT TCCTATGCAA CAAGATCCGA 
1801 TACTTGGACT TATCGTACAA TGACATTCGA TTTATCCCCC CTGAAATTGG 
1851 AGTTCTACAA AGTTTACAGT ATTTTTCCAT CACATGTAAC AAAGTGGAAA 
1901 GCCTTCCAGA TGAACTCTAC TTCTGCAAGA AACTTAAAAC TCTGAAGATT 
1951 GGAAAAAACA GCCTATCTGT ACTTTCACCG AAAATTGGAA ATTTGCTATT 
2001 TCTTTCCTAC TTAGATGTAA AAGGTAATCA CTTTGAAATC CTCCCTCCTG 
2051 AACTGGGTGA CTGTCGGGCT CTGAAGCGAG CTGGTTTAGT TGTAGAAGAT 
2101 GCTCTGTTTG AAACTCTGCC TTCTGACGTC CGGGAGCAAA TGAAAACAGA 
2151 ATAACTTATT TTTCGTTAAA GTTTGACTGA AACACGCTTC TACCAAATAC 
2201 AGTATAAATA ATTAGGTAGT CTTAATGCCT TTCCTATTTT TTTTTCCTTT 
2251 TCACACAAAA TGTACACAAA GATCGCGTAA GGAGTATGTA TTTTTAATAA 
2301 AAATTTAATT GTATTTTTTC AATATTAAAA AAAAAAAAAA AAA 



BLAST Results 



No BLAST result 
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Medline entries 



96421675: 

Characterization of densin-180, a new brain-specific synaptic protein 
of the 

O-sialoglycoprotein family. 
98337190: 

SUR-8, a conserved Ras-binding protein with leucine-rich 
repeats, positively regulates Ras-mediated signaling in C. 
elegans . 



Peptide information for frame 1 



ORF from 28 bp to 2151 bp; peptide length: 708 
Category: similarity to known protein 
Classification: Cell signaling/communication 



1 MKGLKTDLDL QQYSFINQMC 

51 FKFPGSSSKI EHFISILGKC 

101 SNTIQSGPEG SLVNSQSLKS 

151 KKFRLHVEEG DILYAMYVRQ 

201 DIQDMTGYKN FSCNHTMAHL 

251 LREYSFEYVR QETGIDDIPD 

301 SENKLKQLNL NNEWTPDKLR 

351 LQSLKLEIIK NVMIPATIAQ 

401 LSVKFDDMRE LPPWMYGLRN 

451 ILSIKSNVSK IPQAWDVSS 

501 LVHCDLERIP HAVFSLLSLQ 

551 HNSITYIPEH IKKLTSLERL 

601 RFIPPEIGVL QSLQYFSITC 

651 PKIGNLLFLS YLDVKGNHFE 

701 VREQMKTE 



YERALHWYAK YFPYLVLIHT LVFMLCSNFW 
FDSPWTTRAL SEVSGEDSEE KDNRKNNMNR 
IPEKFVVDKS TAGALDKKEG EQAKALFEKV 
TVLKVIKFLI IIAYNSALVS KVQFTVDCNV 
FSKLSFCYLC FVSIYGLTCL YTLYWLFYRS 
VKNDFAFMLH MIDQYDPLYS KRFAVFLSEV 
QKLQTNAHNR LELPLIMLSG LPDTVFEITE 
LDNLQELSLH QCSVKIHSAA LSFLKENLKV 
LEELYLVGSL SHDISRNVTL ESLRDLKSLK 
HLQKMCIHND GTKLVMLNNL KKMTNLTELE 
ELDLKENNLK SIEEIVSFQH LRKLTVLKLW 
SFSHNKIEVL PSHLFLCNKI RYLDLSYNDI 
NKVESLPDEL YFCKKLKTLK IGKNSLSVLS 
ILPPELGDCR ALKRAGLWE DALFETLPSD 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel_19j 11, frame 1 

TREMBL :HSD984_1 gene: "KIAA0231"; Human mRNA for KIAA0231 gene, 
partial cds., N - 1, Score « 1408, P = 4.5e-144 

TREMBL :AF0 54827^1 gene: "soc-2"; product: "leucine-rich repeat protein 
SOC-2"; Caenorhabditis elegans leucine-rich repeat protein SOC-2 
(soc-2) mRNA, complete cds., N = 1, Score - 304, P - 5.7e-24 

TREMBL :RNU66707_1 product: "densin-180"; Rattus norvegicus densin-180 
mRNA, complete cds., N = 1, Score -311, P = 7.4e-24 

TREMBL : AF0 68 92 1_1 product: "Ras-binding protein SUR-8"; Mus musculus 
Ras-binding protein SUR-8 mRNA, complete cds., N = 1, Score » 302, P = 
l.le-23 



>TREMBL:HSD984_1 gene: "KIAA0231"; Human mRNA for KIAA0231 gene, partial 
cds . 

Length = 476 



HSPs: 



Score = 1408 (211.3 bits), Expect = 4.5e-144, P « 4.5e-144 
Identities = 265/471 (56%), Positives - 361/471 (76%) 

Query: 237 LTCLYTLYWLFYRSLREYSFEYVRQETGIDDIPDVKNDFAFMLHMIDQYDPLYSKRFAVF 296 

LT Y+L+W+ SL++YSFE +R+++ DIPDVKNDFAF+LH+ DQYDPLYSKRF++F 
Sbjct: 1 LTSSYSLWWMLRSSLKQYSFEALREKSNYSDIPDVKNDFAFILHLADQYDPLYSKRFSIF 60 

Query: 297 LSEVSENKLKQLNLNNEWTPDKLRQKLQTNAHNRLELPLIMLSGLPDTVFEITELQSLKL 356 

LSEVSENKLKQ+NLNNEWT +KL+ KL NA +++EL L ML+GLPD VFE+TE++ L L 
Sbjct: 61 LSEVSENKLKQINLNNEWTVEKLKSKLVKNAQDKIELHLFMLNGLPDNVFELTEMEVLSL 120 
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Query: 357 EIIKNVMIPATIAQLDNLQELSLHQCSVKIHSAALSFLKENLKVLSVKFDDMRELPPWMY 416 

E+I V +P+ + +QL NL+EL ++ S+ + AL+FL+ENLK+L +KF +M ++P W++ 
Sbjct: 121 ELIPEVKLPSAVSQLVNLKELRVYHSSLWDHPALAFLEENLKILRLKFTEMGKIPRWVF 180 

Query: 417 GLRNLEELYLVGSLSHDISRNVTLESLRDLKSLKILSIKSNVSKIPQAVVDVSSHLQKMC 476 

L+NL+ELYL G + + + LE +DLK+L+ L +KS++S+IPQ V D+ LQK+ 
Sbjct: 181 HLKNLKELYLSGCVLPEQLSTMQLEGFQDLKNLRTLYLKSSLSRIPQWTDLLPSLQKLS 240 

Query: 477 IHNDGTKLVMLNNLKKMTNLTELELVHCDLERIPHAVFSLLSLQELDLKENNLKSIEEIV 536 

+ N+G+KLV+LNNLKKM NL LEL+ CDLERI PH++FSL +L ELDL+ENNLK++EEI+ 
Sbjct: 241 LDNEGSKLVVLNNLKKMVNLKSLELISCDLERIPHSIFSLNNLHELDLRENNLKTVEEII 300 

Query: 537 SFQHLRKLTVLKLWHNSITYIPEHIKKLTSLERLSFSHNKIEVLPSHLFLCNKIRYLDLS 596 

SFQHL+ L+ LKLWHN+I YIP I L++LE+LS HN IE LP LFLC K+ YLDLS 
Sbjct: 301 SFQHLQNLSCLKLWHNNIAYIPAQIGALSNLEQLSLDHNNIENLPLQLFLCTKLHYLDLS 360 

Query: 597 YNDIRFIPPEIGVLQSLQYFSITCNKVESLPDELYFCKKLKTLKIGKNSLSVLSPKIGNL 656 

YN + FIP EI L +LQYF++T N +E LPD L+ CKKL+ L +GKNSL LSP +G L 
Sbjct: 361 YNHLTFI PEEIQYLSNLQYFAVTNNNI EMLPDGLFQCKKLQCLLLGKNSLMNLSPHVGEL 420 

Query: 657 LFLSYLDVKGNHFEILPPELGDCRALKRAGLVVEDALFETLPSDVREQMKT 707 

L++L++ GN+ E LPPEL C++LKR L+VE+ L TLP V E+++T 
Sbjct: 421 SNLTHLELIGNYLETLPPELEGCQSLKRNCLIVEENLLNTLPLPVTERLQT 471 

Pedant information for DKFZphutel_19j 11, frame 1 



Report for DKFZphutel_19jll. 1 

{LENGTH] 708 

[MW] 81812.82 

tpl) 7.55 

[HOMOL] TREMBL : HSD984_1 gene: "KIAA0231"; Human mRNA for KIAA0231 gene, partial cds. 
le-149 

[ FUNCAT J 30.02 organization of plasma membrane [S. cerevisiae, YJLOOSw] 3e-17 

[FUNCAT] 03.22 cell cycle control and mitosis (S. cerevisiae, YJLOOSw] 3e-17 

[FUNCAT] 10.04.03 second messenger formation [S. cerevisiae, YJLOOSw] 3e-17 

[FUNCAT] 01.03.10 metabolism of cyclic and unusual nucleotides [S. cerevisiae, 

YJLOOSw] 3e-17 

[FUNCAT] 03.10 sporulation and germination [S. cerevisiae, YJLOOSw] 3e-17 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YKL193c] 3e-09 

[ FUNCAT ) 06.07 protein modification (glycolsylation, acylation, myristylation, 

palmitylation, f arnesylation and processing) [S. cerevisiae, YKL193c] 3e-09 

[FUNCAT] 04.05.01.04 transcriptional control (S. cerevisiae, YAL021c] 9e-08 

[FUNCAT] 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YAL021c] 

9e-08 

[FUNCAT] 01.01.04 regulation of amino-acid metabolism [S. cerevisiae, YAL021c] 

9e-08 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YOR353c] 3e-07 

[BLOCKS] BL00868F 

[BLOCKS] BL00985B Spermadhesins family proteins 

[EC] 3.4.17.3 Lysine carboxypeptidase le-08 

[EC] 4.6.1.1 Adenylate cyclase 3e-18 

[PIRKW] blocked amino end le-10 

[PIRKW] phosphotransferase le-09 

[PIRKW] nucleus 6e-08 

[PIRKW] duplication 3e-18 

[PIRKW] platelet le-10 

[PIRKW] tandem repeat 7e-16 

[PIRKW] keratan sulfate 7e-07 

[PIRKW] metallo-carboxypeptidase le-08 

[PIRKW] transmembrane protein le-10 

[PIRKW] serine/threonine-specific protein kinase le-09 

[PIRKW] autophosphorylation le-09 

[PIRKW] cartilage 7e-07 

[PIRKW] connective tissue 7e-07 

[PIRKW] magnesium le-09 

[PIRKW] cAMP biosynthesis 3e-18 

[PIRKW] ATP le-09 

[PIRKW] receptor le-09 

[PIRKW] leucine zipper 3e-13 

[PIRKW] glycoprotein 5e-12 

[PIRKW] extracellular matrix 7e-07 

[PIRKW] chondroitin sulfate proteoglycan 7e-07 

[ PIRKW] cell adhesion le-08 

[PIRKW] hydrolase le-08 

[PIRKW] sulfoprotein 7e-07 

[PIRKW] membrane protein le-08 

[PIRKW] phosphorus-oxygen lyase 3e-18 
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[PIRKW] collagen binding 7e-07 

[SUPFAM] leucine-rich alpha-2-glycoprotein repeat homology 3e-21 

[SUPFAM] chaoptin le-08 

[SUPFAM] gelsolin repeat homology 3e-21 

ESUPFAM] protein kinase homology le-09 

[SUPFAM] protein kinase Xa21 le-09 

[SUPFAM] fibromodulin 4e-12 

[SUPFAM] yeast adenylate cyclase catalytic domain homology 3e-18 

[SUPFAM] yeast adenylate cyclase 3e-18 

[KW] TRANSMEMBRANE 3 

[KW] LOW COMPLEXITY 1.41 % 



SEQ MKGLKTDLDLQQYSFINQMCYERALHWYAKYFPYLVLIHTLVFMLCSNFWFKFPGSSSKI 

SEG 

PRD ccccchhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhccceeeeccccccee 

MEM MMMMMMMMMMMMMMMMM 

SEQ EHFISILGKCFDSPWTTRALSEVSGEDSEEKDNRKNNMNRSNTIQSGPEGSLVNSQSLKS 

SEG 

PRD eeeeeeeecccccccceeeeecccccccccccccccccccccccccccccceeeeccccc 



MEM 



SEQ IPEKFVVDKSTAGALDKKEGEQAKALFEKVKKFRLHVEEGDILYAMYVRQTVLKVIKFLI 

SEG 

PRD cccceeecccccccccchhhhhhhhhhhhhhhhhhhhcccceeeehhhhhhhhhhhhhhh 

MEM MMMMMMMMM 

SEQ IIAYNSALVSKVQFTVDCNVDIQDMTGYKNFSCNHTMAHLFSKLSFCYLCFVSIYGLTCL 

SEG 

PRD hhhhcchhhhheeeeeccccccccccccccccccchhhhhhhhheeeeeeeeeeccceee 

MEM MMMMMMMM MMMMMMMMMMMMMMMMM 

SEQ YTLYWLFYRSLREYSFEYVRQETGIDDIPDVKNDFAFMLHMIDQYDPLYSKRFAVFLSEV 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhcccchhhhhhhhhhhhh 

MEM 

SEQ SENKLKQLNLNNEWTPDKLRQKLQTNAHNRLELPLIMLSGLPDTVFEITELQSLKLEIIK 

SEG . .xxxxxxxxxx 

PRD hhhhhhhhhccccccccchhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhh 

MEM 

SEQ NVMIPATIAQLDNLQELSLHQCSVKIHSAALSFLKENLKVLSVKFDDMRELPPWMYGLRN 

SEG 

PRD hccccccchhhhhhhhhhhhccccccccccccchhhhhhhhhhccccccccccccchhhh 

MEM 

SEQ LEEL YLVGSLSHDI SRNVTLESLRDLKSLKI LS I KSNVSKI PQA WDVSSHLQKMC I HND 

SEG 

PRD hhhhhhccccccccccccccchhhhhhhhhhhhcccccccccccchhhhhhhhhhhcccc 

MEM 

SEQ GTKLVMLNNLKKMTNLTELELVHCDLERIPHAVFSLLSLQELDLKENNLKSIEEIVSFQH 

SEG 

PRD ceeeecccccccchhhhhhhhhccccccccccchhhhhhhhhhhccccccccccccccch 

MEM 

SEQ LRKLTVLKLWHNSITYIPEHIKKLTSLERLSFSHNKIEVLPSHLFLCNKIRYLDLSYNDI 

SEG 

PRD hhhhhhhcccccceeecccccchhhhhheeeccccceeecccccchhhhhhhhhhccccc 

MEM 

SEQ RFIPPEIGVLQSLQYFSITCNKVESLPDELYFCKKLKTLKIGKNSLSVLSPKIGNLLFLS 

SEG 

PRD cccccccchhhhhhhhhhhccccccccccccchhhhhcccccccceeecccccccchhhh 

MEM 

SEQ YLDVKGNHFEI LPPELGDCRALKRAGLVVEDALFETLPSDVREQMKTE 

SEG 

PRD hhhccccccccccccchhhhhhhhheeeeccccccccccccccccccc 

MEM 



(No Prosite data available for DKFZphutel_19 jll . 1) 
(No Pfam data available for DKFZphutel_19 j 11 . 1 ) 
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DKFZphutel_li2 

group: transcription factor 

DKFZphutel_li2 encodes a novel 594 amino acid protein similar to signal transducing proteins. 

The protein contains 2 WD-40 repeats, which is typical for the beta-transducin subunit of G- 
proteins. In addition, the protein contains a C3HC4 zinc finger and a leucine zipper. The beta 
subunits seem to be required for the replacement of GDP by GTP as well as for membrane 
anchoring and receptor recognition. Due to the zinc finger the novel protein seems to be a new 
molecule involved in signal transduction and transcription. 

The new protein can find application in modulating/blocking gene expression of genes 
controlled by this molecule. 

similarity to Dictostelium myosin heavy chain kinase 

complete cDNA, complete cds, EST hits 

[PFAMJ Zinc finger, C3HC4 type (RING finger) 

[PFAMJ WD domain, G-beta repeats 

(SCOP] dltbgc_ 2.46.3.1.1 betal-subunit of the 

signal-transducing G protei 3e-07 

Sequenced by BMFZ 

Locus: /map="16pl3.3" 

Insert length: 3584 bp 

Poly A stretch at pos. 3555, polyadenylation signal at pos. 3537 

1 GGGCGGGAGG TGCTTCCCAA GGACCGTAGA TGCCTCTCTA GAGCATGAGC 
51 TCAGGCAAGA GTGCCCGCTA CAACCGCTTC TCCGGGGGGC CCAGCAATCT 

101 TCCCACCCCA GACGTCACCA CAGGGACCAG AATGGAAACG ACCTTCGGAC 

151 CCGCCTTTTC AGCCGTCACC ACCATCACAA AAGCTGACGG GACCAGCACC 

201 TACAAGCAGC ACTGCAGGAC AGCATGCCCC CCATCAGCAC TCCCCGCCGC 

251 TCCGACTCCG CCATCTCTGT CCGCTCCCTG CACTCAGAGT CCAGCATGTC 

301 TCTGCGCTCC ACATTCTCAC TGCCCGAGGA GGAGGAGGAG CCGGAGCCAC 

351 TGGTGTTTGC GGAGCAGCCC TCGGTGAAGC TGTGCTGTCA GCTCTGCTGC 

4 01 AGCGTCTTCA AAGACCCCGT GATCACCACG TGTGGGCACA CGTTCTGTAG 

4 51 GAGATGCGCC TTGAAGTCAG AGAAGTGTCC CGTGGACAAC GTCAAACTGA 

501 CCGTGGTGGT GAACAACATC GCGGTGGCCG AGCAGATCGG GGAGCTCTTC 

551 ATCCACTGCC GGCACGGCTG CCGGGTAGCG GGCAGCGGGA AGCCCCCCAT 

601 CTTTGAGGTG GACCCCCGAG GGTGCCCCTT CACCATCAAG CTCAGCGCCC 

651 GGAAGGACCA CGAGGGCAGC TGTGACTACA GGCCTGTGCG GTGTCCCAAC 

701 AACCCCAGCT GCCCCCCGCT GCTCAGGATG AACCTGGAGG CCCACCTCAA 

751 GGAGTGCGAG CACATCAAAT GCCCCCACTC CAAGTACGGG TGCACGTTCA 

801 TCGGGAACCA GGACACTTAC GAGACCCACC TGGAGACTTG CCGCTTCGAG 

851 GGCCTGAAGG AGTTTCTGCA GCAGACGGAT GACCGCTTCC ACGAGATGCA 

901 CGTGGCTCTG GCCCAGAAGG ACCAGGAGAT CGCCTTCCTG CGCTCCATGC 

951 TGGGAAAGCT CTCGGAGAAG ATCGACCAGC TAGAGAAGAG CCTGGAGCTC 
1001 AAGTTTGACG TCCTGGACGA AAACCAGAGC AAGCTCAGCG AGGACCTCAT 
1051 GGAGTTCCGG CGGGACGCAT CCATGTTAAA TGACGAGCTG TCCCACATCA 
1101 ACGCGCGGCT GAACATGGGC ATCCTAGGCT CCTACGACCC TCAGCAGATC 
1151 TTCAAGTGCA AAGGGACCTT TGTGGGCCAC CAGGGCCCTG TGTGGTGTCT 
1201 CTGCGTCTAC TCCATGGGTG ACCTGCTCTT CAGTGGCTCC TCTGACAAGA 
1251 CCATCAAGGT GTGGGACACA TGTACCACCT ACAAGTGTCA GAAGACACTG 
1301 GAGGGCCATG ATGGCATCGT GCTGGCTCTC TGCATCCAGG GGTGCAAACT 
1351 CTACAGCGGC TCTGCAGACT GCACCATCAT TGTGTGGGAC ATCCAGAACC 
1401 TGCAGAAGGT GAACACCATC CGGGCCCATG ACAACCCGGT GTGCACGCTG 
1451 GTCTCCTCAC ACAACGTGCT CTTCAGCGGC TCCCTGAAGG CCATCAAGGT 
1501 CTGGGACATC GTGGGCACTG AGCTGAAGTT GAAGAAGGAG CTCACAGGCC 
1551 TCAACCACTG GGTGCGGGCC CTGGTGGCTG CCCAGAGCTA CCTGTACAGC 
1601 GGCTCCTACC AGACAATCAA GATCTGGGAC ATCCGAACCC TTGACTGCAT 
1651 CCACGTCCTG CAGACGTCTG GTGGCAGCGT CTACTCCATT GCTGTGACAA 
1701 ATCACCACAT TGTCTGTGGC ACCTACGAGA ACCTCATCCA CGTGTGGGAC 
1751 ATTGAGTCCA AGGAGCAGGT GCGGACCCTC ACGGGCCACG TGGGCACCGT 
1801 GTATGCCCTG GCGGTCATCT CGACGCCAGA CCAGACCAAA GTCTTCAGTG 
1851 CATCCTACGA CCGGTCCCTC AGGGTCTGGA GTATGGACAA CATGATCTGC 
1901 ACGCAGACCC TGCTGCGTCA CCAGGGCAGT GTCACCGCGC TGGCTGTGTC 
1951 CCGGGGCCGA CTCTTCTCAG GGGCTGTGGA TAGCACTGTG AAGGTTTGGA 
2001 CTTGCTAACA GGATCCAGGC CAGGCTGTGG TTTCCCCTGA ACCAGCCCTG 
2051 GACCTTTCTG AGCCAGGCTG GCCACATGGG GTGGTCTCGG GGTTTCTGCC 
2101 TGCCCCGTGG GCATAGGTGG ACAGGCTCTG GCAGCCGGGC AGTGCCCTCC 
2151 CCGTCCCATG CTCGGCGAGC CTCCCTCTAC TCGGCACTGT CCTTGCTGCC 
2201 CAGCCCCTCT CTGGGTGCCA GGTACGACGC TTGCCCCGGC CCACCCTCCA 
2251 TCCCCACCCT CCATCCCCAC CCTAGATGGA GCGAGGGCCT TTTTACTCAC 
2301 CTTTTCTACC GTTTTTAGAC TGTATGTAGA TTTGGTTACC TCCTGGTTGA 
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2351 AATAAATGCT CCACAGACTG 
2401 AAGGGGGCTG TGTGTGGCCT 
2451 GTGAGTGGGG GGGCATGGGG 
2501 GCCCACTCCG GGGCCTCCCC 
2551 AGCTGCTGGC CTCCAGTCCC 
2601 TGAGCCAGGC ACCTCTGTTT 
2651 CCTTGCCCAG ACCTCCCCTG 
2701 CTCTGAGGAG AGGCCTGGGG 
2751 ACACGGGGTG AGACAGCAGG 
2801 CGCCAGCCGC CTCCACCCGC 
2851 TTTTAAATTT TTTTTTTAAG 
2901 TCAGCAAACA CGATAGAGGA 
2951 AGGAGAGAGG AAAAGGGAGG 
3001 CCATGAGCAG AAGCGTCCGT 
3051 GCACAGCCCC TGGAGAGGGG 
3101 CCGTGGCCTG GCCTGCTACA 
3151 CACACCCACA TTCACCAAAC 
3201 GGAGGAGGAC ACGGCCGCCG 
3251 CGCAGAGAAC TTAGGAGAGA 
3301 CCCCCGGGCC CCAGCCTTCC 
3351 TGGCCGGAGG AAGGACCGCA 
3401 GTCCGGAGCT AGACTTCGTG 
3451 AATCAATAAT ATTTCTTTCT 
3501 TTTGTTTCTC TGGGGAAATC 
3551 TCTTGATAAA AAAAAAAAAA 



TGGCTGTGAG TGGGGACAGC TCCTCGGGAC 
TGAGGTTGGT GTGCACAGGC ACTGGCTGCT 
CAGTTTCCTT TGGTGGACCC CAGGACTTCG 
TCCCTGCTAG GAGGCAACTC GTCACACCCA 
ATCTCCCCCA ACACATGTGC CCCCAAAAAG 
CCTGCTGTTT ATTGACAGCC GACGGCAGCG 
CCCACCTGCT GGAGCCCAGC CTGTGCCGCC 
GGACAGCTGG GCACGTCCAC TCGCAGGGAA 
AAGGGGCCCT GCACGCCGGG ACGCCACCTC 
CCCACACCAC AATCGCTGGT TTTCGGCATT 
AAACGTCAAA GTTGTGCCCA ACACTGTGGA 
GACCAGTCAG TACTTCTTGG AGGGGGCAGG 
GCGAGAATGA CCACACAACA CAGCCTTGGA 
GGGAACTCCA CTGGGGTGGA TGGGCTGCCT 
GCCAGGCACA CCCTCAGAGG AGCTGCAAGC 
TGCCCTGCTT CCACGTGGCT GCCACGCTGA 
CCACCCGCGC CCTGGGACGC AGCCACGCCA 
AGAGCAAGGC ACAACCTCGA GTTCTTGGGG 
AGCACGGAGG AGCCCCCGGC AGAGCACCCG 
ACCTGTGCTA GCAGCCTGGG GCCTCCACTC 
GGCAGACAGC CTGGGCCTCT AACAGCTTTT 
TCCTTTCAGT TGGTAAATGG TTTTCTATAG 
TTAAATATAT ATTTGTTAAA GTTATACCTT 
CGCCTCAGCT CATTCCCAAT AAATTAATAC 
AGAAAAAAAA AAAA 



BLAST Results 



Entry HSBE from database EMBL: 

Homo sapiens (clone exon trap d5) chromosome 16pl3.3 gene, exon. 
Score » 2375, P = 7.1e-101, identities = 475/475 

Entry HSBD from database EMBL: 

Homo sapiens (clone exon trap d32) chromosome 16pl3.3 gene, exon. 
Score o 876, P = 3.0e-31, identities = 176/177 



Medline entries 



95122486: 

Structural analysis of myosin heavy chain kinase A from 
Dictyostelium. Evidence for a highly divergent protein kinase 
domain, an amino-terminal coiled-coil domain, and a 
domain homologous to the beta-subunit of heterotrimeric G 
proteins . 

96149460: 

Dictyostelium myosin heavy chain kinase A regulates myosin localization 
during growth and 
development . 

97277316: 

Identification of a protein kinase from Dictyostelium with homology to 
the novel catalytic 

domain of myosin heavy chain kinase A. 

96009891: 

A gene responsible for vegetative incompatibility in the fungus 
Podospora anserina encodes a 

protein with a GTP-binding motif and G beta homologous domain. 



Peptide information for frame 2 



ORF from 224 bp to 2005 bp; peptide length: 594 
Category: similarity to known protein 
Prosite motifs: ZINC_FINGER_C3HC4 (70-80) 
LEUCINE ZIPPER (436-458) 
LEUCINE~ZIPPER (436-458) 
G_BETA_REPEATS (335-355) 
G BETA REPEATS (376-391) 
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1 MPPISTPRRS DSAISVRSLH SESSMSLRST FSLPEEEEEP EPLVFAEQPS 
51 VKLCCQLCCS VFKDPVITTC GHTFCRRCAL KSEKCPVDNV KLTWVNNIA 
101 VAEQIGELFI HCRHGCRVAG SGKPPIFEVD PRGCPFTIKL SARKDHEGSC 
151 DYRPVRCPNN PSCPPLLRMN LEAHLKECEH IKCPHSKYGC TFIGNQDTYE 
201 THLETCRFEG LKEFLQQTDD RFHEMHVALA QKDQEIAFLR SMLGKLSEKI 
251 DQLEKSLELK FDVLDENQSK LSEDLMEFRR DASMLNDELS HINARLNMGI 
301 LGSYDPQQIF KCKGTFVGHQ GPVWCLCVYS MGDLLFSGSS DKTIKVWDTC 
351 TTYKCQKTLE GHDGIVLALC IQGCKLYSGS ADCTIIVWDI QNLQKVNTIR 
401 AHDNPVCTLV SSHNVLFSGS LKAIKVWDIV GTELKLKKEL TGLNHWVRAL 
451 VAAQSYLYSG SYQTIKIWDI RTLDCIHVLQ TSGGSVYSIA VTNHHIVCGT 
501 YENLIHVWDI ESKEQVRTLT GHVGTVYALA VISTPDQTKV FSASYDRSLR 
551 VWSMDNMICT QTLLRHQGSV TALAVSRGRL FSGAVDSTVK VWTC 

BLAST P hits 

No B LAS TP hits available 

Alert BLASTP hits for DKFZphutel_li2, frame 2 

SWISSPROT:KMHB_DICDI MYOSIN HEAVY CHAIN KINASE B (EC 2.7.1.129) (MHCK 
B)., N - 1, Score = 419, P = 3.6e-37 

SWISSPROT:HETl_PODAN VEGETATIBLE INCOMPATIBILITY PROTEIN HET-E-1 . , N - 
1, Score » 392, P = 3.1e-33 

SWISSPROT:YDJ5_SCHPO HYPOTHETICAL 67.1 KD TRP-ASP REPEATS CONTAINING 
PROTEIN C57A10.05C IN CHROMOSOME I . , N D 1, Score - 357, P - 4 . le-30 

TREMBL:AF032878_1 gene: "slimb"; product: "Slimb'*; Drosophila 
melanogaster Slimb (slimb) mRUA, complete cds., N = 1, Score = 347, P = 
1.7e-29 



>SWISSPROT;KMHB_DlCDI MYOSIN HEAVY CHAIN KINASE B (EC 2.7.1.129) (MHCK B) . 
Length = 732 

HSPs: 



Score = 419 (62.9 bits), Expect - 3.6e-37, P - 3.6e-37 
Identities « 96/268 (35%), Positives = 158/268 (58%) 



Query: 


325 


CLCVYSMGDLLFSGSSDKTIKVWD-TCTTYKCQKTLEGHDGIVLALCIQGCKLYSGSADC 


383 






C+C +LLF+G SD +I+V+D +C +TL+GH+G V ++C L+SGS+D 




Sbjct: 


467 


CIC DNLLFTGCSDNSIRVYDYKSQNMECVQTLKGHEGPVESICYNDQYLFSGSSDH 


522 


Query: 


364 


TIIVWDIQNLQKVNTIRAHDNPVCTLVSSHNVLFSGSL-KAIKVWDIVGTELKLKKELTG 


442 






+ 1 VWD++ L+ + T+ HD PV T++ + LFSGS K IKVWD+ L+ K L 




Sbjct: 


523 


SIKVWDLKKLRCIFTLEGHDKPVHTVLLNDKYLFSGSSDKTIKVWDL— KTLECKYTLES 


580 


Query: 


443 


LNHWVRALVAAQSYLYSGSY-QTIKIWDIRTLDCIHVLQTSGGSVYSIAVTNHHIVCGTY 


501 






V+ L + YL+SGS +TIK+WD++T C + L+ V +1 + ++ G+Y 




Sbjct: 


581 


HARAVKTLC I SGQYLFSGSNDKTI KVWDLKTFRCNYTLKGHTKWVTTI C I LGTNL YSGS Y 


640 


Que ry : 


502 


ENLIHVWDIESKEQVRTLTGHVGTVYALAVISTPDQTKVFSASYDRSLRVWSMDNMICTQ 


561 






+ I VW+++S E TL GH V + + D+ +F+AS D ++++W ++ + C 




Sbjct: 


641 


DKTIRVWNLKSLECSATLRGHDRWVEHMVIC DKL-LFTASDDNTIKIWDLETLRCNT 


696 


Query: 


562 


TLLRHQGSVTALAVSRGR — LFSGAVDSTVKVW 592 








TL H +V LAV + + S + D +++VW 




Sbjct: 


697 


TLEGHNATVQCLAVWEDKKCVI SCSHDQSIRVW 729 




Score 


= 415 


(62.3 bits), Expect = 1.2e-36, P - 1.2e-36 




Identities - 113/303 (37%), Positives = 166/303 (54%) 




Query: 


255 


KSLEL-KFDVLDENQSKLSEDLMEFRRDASMLNDEL-SHINARLNMGILGS YD 


305 






KS++L K ++L N+ K S +L + ++ + SH+ N+ G YD 




Sbjct: 


427 


KSIDLEKPEILINNKKKESINLETIKLIETIKGYHVTSHLCICDNLLFTGCSDNSIRVYD 


486 


Query: 


306 


-PQQIFKCKGTFVGHQGPVWCLCVYSMGDLLFSGSSDKTIKVWDTCTTYKCQKTLEGHDG 


364 






Q +C T GH+GPV +C Y+ LFSGSSD +IKVWD +C TLEGHD 




Sbjct: 


487 


YKSQNMECVQTLKGHEGPVESIC-YN-DQYLFSGSSDHSIKVWDL-KKLRCIFTLEGHDK 


543 


Query: 


365 


IVLALC IQGCKLYSGS ADCTI I VWDIQNLQKVNTIRAHDNPVCTLVSSHNVLFSGSL-KA 


423 






V + + L+SGS+D TI VWD++ L+ T+ +H V TL S LFSGS K 




Sbjct: 


544 


PVHTVLLNDKYLFSGSSDKTIKVWDLKTLECKYTLESHARAVKTLCISGQYLFSGSNDKT 


603 


Query: 


424 


IKVWDIVGTELKLKKELTGLNHWVRALVAAQSYLYSGSY-QTIKIWDIRTLDCIHVLQTS 


482 






IKVWD+ + L G WV + + LYSGSY +TI++W++++L+C L+ 




Sbjct: 


604 


IKVWDL--KTFRCNYTLKGHTKWVTTICILGTNLYSGSYDKTIRVWNLKSLECSATLRGH 


661 



481 



WO 01/12659 



PCT/IBOO/01496 



Query: 


483 


GGSVYSIAVTNHHIVCGTYENLIHVWDIESKEQVRTLTGHVGTVYALAVISTPDQTKVFS 


542 




v++ ++ ++NI +WD+E+ TL GH TV LAV D+ V S 




Sbjct: 


662 


DRWVEHMVlCDKLLFTASDDNTIKIWDLETLRCNTTLEGHNATVQCLAVWE — DKKCVIS 


719 


Query: 


543 


ASYDRSLRVW 552 








S+D+S+RVW 




Sbjct: 


720 


CSHDQSIRVW 729 




Score 


=» 262 


(39.3 bits), Expect = 3.2e-19, P « 3.2e-19 




Identities = 60/184 (32%), Positives = 109/184 (59%) 




Query: 


352 


TYKCQKTLEGHDGIVLALCIQGCKLYSGSADCTIIVWDI— QNLQKVNTIRAHDNPVCTL 


409 






T K +T++G+ + LCI L++G +D +1 V+D QN++ V T++ H+ PV ++ 




Sbjct: 


450 


TIKLIETIKGYH-VTSHLCICDNLLFTGCSDNSIRVYDYKSQNMECVQTLKGHEGPVESI 


508 


Query: 


410 


VSSHNVLFSGSLK-AIKVWDIVGTELKLKKELTGLNHWVRALVAAQSYLYSGSY-QTIKI 


467 






+ LFSGS +IKVWD+ +L+ L G + V ++ YL+SGS +TIK+ 




Sbjct: 


509 


CYNDQYLFSGSSDHSIKVWDL — KKLRCIFTLEGHDKPVHTVLLNDKYLFSGSSDKTIKV 


566 


Query: 


468 


WDIRTLDCIHVLQTSGGSVYSIAVTNHHIVCGTYENLIHVWDIESKEQVRTLTGHVGTVY 


527 






WD++TL+C + L++ +V ++ ++ ++ G+ + I VWD+++ TL GH V 




Sbjct: 


567 


WDLKTLECKYTLESHARAVKTLCISGQYLFSGSNDKTIKVWDLKTFRCNYTLKGHTKWVT 


626 


Query: 


528 


ALAVIST 534 








+ ++ T 




Sbjct: 


627 


TICILGT 633 




Score 


- 173 


(26.0 bits), Expect = 1.7e-09, P = 1.7e-09 




Identities - 43/118 (36%), Positives - 65/118 (55%) 




Query: 


310 


FKCKGTFVGHQGPVWCLCVYSMGDLLFSGSSDKTIKVWDTCTTYKCQKTLEGHDGIVLAL 


369 






F+C T GH V +C+ +G L+SGS DKTI+VW+ + +C TL GHD V + 




Sbjct: 


612 


FRCNYTLKGHTKWVTTICI--LGTNLYSGSYDKTIRVWNL-KSLECSATLRGHDRWVEHM 


668 


Query : 


370 


CIQGCKLYSGSADCTIIVWDIQNLQKVNTIRAHDNPV-CTLVSSHN — VLFSGSLKAIKV 


426 






I L++ S D TI +WD++ L+ T+ H+ V C V V+ ++I+V 




Sbjct: 


669 


VICDKLLFTASDDNTIKIWDLETLRCNTTLEGHNATVQCLAVWEDKKCVI SCSHDQSI RV 


728 


Query : 


427 


W 427 




Sbjct: 


729 


W 

W 729 





Pedant information for DKFZphutel_li2, frame 2 



Report for DKFZphutel_li2 .2 



[LENGTH] 
[MW] 
tpl] 
[HOMOL] 

(FUNCATJ 
[FUNCAT] 
[FUNCAT] 
(FUNCATJ 
[FUNCAT] 
5e-21 
[FUNCAT] 
2e-15 
[FUNCAT] 
[FUNCAT] 
le-14 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
YDL145C] 
[FUNCAT] 
le-13 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
TAF90 - 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT) 
YMR116C] 



594 

66541.94 
6.64 

SWISSPROT:KMHB DICDI MYOSIN HEAVY CHAIN KINASE B 



(EC 2.7.1.129) (MHCK B) . 3e-37 



03.22 cell cycle control and mitosis [S. cerevisiae, YIL046w] 5e-21 

06.13.01 cytoplasmic degradation [S. cerevisiae, YIL046w] 5e-21 

04.05.01.04 transcriptional control [S. cerevisiae, YIL046w] 5e-21 

30.10 nuclear organization [S. cerevisiae, YIL04 6w] 5e-21 

01.01.04 regulation of amino-acid metabolism [S. cerevisiae, YIL046w] 



99 unclassified proteins 



[S. cerevisiae, YCR072c beta-transducin family] 



le-13 



30.04 organization of cytoskeleton [S. cerevisiae, YFL009w] le-14 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YFL009w] 

03.10 sporulation and germination [S. cerevisiae, YFL009w] le-14 
03.16 dna synthesis and replication [S. cerevisiae, YFL009w] le-14 
30.09 organization of intracellular transport vesicles [S. cerevisiae, 



08.07 vesicular transport (golgi network, etc.) 



(S. cerevisiae, YDL145c] 



04.05.03 mrna processing (splicing) [S. cerevisiae, YPRl78w] 2e-ll 
06.10 assembly of protein complexes [S. cerevisiae, YPRl78w] 2e-ll 
04.05.01.01 general transcription activities [S. cerevisiae, YBR198c 

TFIID subunit] 3e-ll 

03.13 meiosis [S. cerevisiae, YLR129w] 8e-09 

30.03 organization of cytoplasm (S. cerevisiae, YCR057c] 2e-07 
03.25 cytokinesis [S. cerevisiae, YCR057c] 2e-07 
02.16 fermentation [S. cerevisiae, YMRll6c] 5e-07 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

5e-07 



482 



WO 01/12659 PCT7IB00/01496 

[ FUN CAT J 06.13 proteolysis [S. cerevisiae, YGL003c} 3e-06 

( FUNC AT ] 03.01 cell growth [S. cerevisiae, YKL021c] 2e-04 

[ FUNC AT ) 01.03.07 deoxyribonucleotide metabolism [S. cerevisiae, YOR269w] 2e-04 

(FUNC AT] 30.02 organization of plasma membrane [S. cerevisiae, YOR212w] 0.001 

[FUNCAT] 10.05.07 g-proteins [S. cerevisiae, YOR212w] 0.001 

( FUNC AT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

(S. cerevisiae, YOR212w) 0.001 

[BLOCKS] BL00678 

[BLOCKS] BL00518 Zinc finger, C3HC4 type, proteins 

[SCOP] dltbgd_ 2.46.3.1.1 betal-subunit of the signal-transducing 3e-10 

(EC] 2.7.1.129 Myosin-heavy-chain kinase 3e-26 

[PIRKW] * phosphotransferase 3e-26 

[PIRKW] nucleus le-06 

[PIRKW] plasma 9e-08 

[PIRKW] duplication 3e-25 

[PIRKW] hormone 9e-08 

[PIRKW] zinc 3e-09 

[PIRKW] cell cycle control 4e-13 

[PIRKW] transmembrane protein 3e-12 

[PIRKW] zinc finger le-08 

[PIRKW] stomach 9e-08 

[PIRKW] dna binding 9e-06 

[PIRKW] autophosphorylation 3e-26 

[PIRKW] phosphoprotein 3e-26 

[PIRKW] signal transduction 5e-08 

[PIRKW] heterotrimer 5e-08 

[PIRKW] coiled coil 3e-26 

[PIRKW] multimer 3e-26 

[PIRKW] transcription regulation 4e-10 

[PIRKW] GTP binding 5e-08 

[SUPFAM) chromobox homology 9e-06 

[SUPFAM] RING finger homology 3e-09 

[SUPFAM] coatomer complex beta* chain le-07 

[SUPFAM] WD repeat homology 3e-26 

[SUPFAM] yeast coatomer complex alpha chain 3e-12 

[SUPFAM] GTP-binding regulatory protein beta chain 5e-08 

[SUPFAM] PRL1 protein 2e-09 

[PROSITE] WD_RE PEATS 2 

[PROSITE] LEUCINE_ZIPPER 1 

[PROSITE] MYRISTYL 14 

[PROSITE] CK2_PHOSPHO_SITE 4 

[PROSITE] ZINC_FINGER C3HC4 1 

[PROSITE] PKC_PHOSPHO~SITE 18 

[PROSITE] ASN_GLYCOSYLATION 1 

[PFAM] Zinc finger, C3HC4 type (RING finger) 

[PFAM] WD domain, G-beta repeats 

[KW] Irregular 

[KW] 3D 

[KW] LOW_COMPLEXITY 6.23 % 

[KW] COILED_COIL 6.73 % 

SEQ MPPISTPRRSDSAISVRSLHSESSMSLRSTFSLPEEEEEPEPLVFAEQPSVKLCCQLCCS 

SEG xxxxxxxxxxxxxxx .... xxxxxxxxx 

COILS 

lgg2B 



SEQ VFKDPVITTCGHTFCRRCALKSEKCPVDNVKLTVVVNNIAVAEQIGELFIHCRHGCRVAG 

SEG 

COILS 

lgg2B 

SEQ SGKPPIFEVDPRGCPFTIKLSARKDHEGSCDYRPVRCPNNPSCPPLLRMNLEAHLKECEH 

SEG 

COILS 

lgg2B 

SEQ I KCPHSKYGCTFIGNQDTYETHLETCRFEGLKEFLQQTDDRFHEMHVALAQKDQEI AFLR 

SEG 

COILS CCCCCCCCCCCCCC 

lgg2B 

SEQ SMLGKLSEKIDQLEKSLELKFDVLDENQSKLSEDLMEFRRDASMLNDELSHINARLNMGI 

SEG 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCC 

lgg2B 

SEQ LGSYDPQQI FKCKGTFVGHQGPVWCLCVYSMGDLLFSGSSDKTIKVWDTCTTYKCQKTLE 

SEG 

COILS 

lgg2B EECCCCCCEEEEEETTTTCEEEEEETTTEEEEEEG-GGCEEEEEEE 



483 



WO 01/12659 



PCT/IB00/01496 



SEQ GHDGIVLALCIQGCKLYSGSADCTIIVWDIQNLQKVNTIRAHDNPVCTLVSSHNVLFSGS 

SEG 

COILS 

lgg2B CCCCCEEEEEETTCEEEEEETTTCEEEEETTTTEEEEEE-CTTTTCCCEEE 

SEQ LKAIKVWDIVGTELKLKKELTGLNHWVRALVAAQSYLYSGSYQTIKIWDIRTLDCIHVLQ 

SEG xxxxxxxxxxxxx 

COILS 

lgg2B 

SEQ TSGGSVYSIAVTNHHIVCGTYENLIHVWDIESKEQVRTLTGHVGTVYALAVISTPDQTKV 

SEG 

COILS 

lgg2B 

SEQ FSASYDRSLRVWSMDNMICTQTLLRHQGSVTALAVSRGRLFSGAVDSTVKVWTC 

SEG 

COILS 

lgg2B 



Prosite for DKFZphutel_li2 . 2 



PS00001 


267->271 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


6->9 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


15->18 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


26->29 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


50->53 


PKC PHOSPHO" 


SITE 


PDOC00005 


PSUUUUa 




PKC PHOSPHO" 


"SITE 


r> r*/-\<^ rt n f\ a e 
rUOt-UUUU J 


PS00005 


121->124 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


137->140 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


141->144 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


205->208 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


247->250 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


340->343 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


343->346 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


352->355 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


398->401 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


420->423 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


464->467 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


548->551 


PKC PHOSPHO" 


'SITE 


PDOC00005 


PS00005 


588->591 


PKC~"PHOSPHO" 


"site 


PDOC00005 


PS00006 


32->36 


CK2~PHOSPHO" 


"site 


PDOC00006 


PS00006 


201->205 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


330->334 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


533->537 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


115->121 


MYRISTYL 


PDOC00O08 


PS00008 


133->139 


MYRISTYL 




PDOC00008 


PS00008 


194->200 


MYRISTYL 




PDOC00008 


PS00008 


299->305 


MYRISTYL 




PDOC00008 


PS00008 


314->320 


MYRISTYL 




PDOC00008 


PS00008 


364->370 


MYRISTYL 




PDOC00008 


PS00008 


379->385 


MYRISTYL 




PDOC00008 


PS00008 


419->425 


MYRISTYL 




PDOC00008 


PS00008 


460->466 


MYRISTYL 




pDocooooa 


PS00008 


484->490 


MYRISTYL 




PDOC00008 


PS00008 


499->505 


MYRISTYL 




PDOC00008 


PS00008 


524->530 


MYRISTYL 




PDOC00008 


PS00008 


568->574 


MYRISTYL 




PDOC00008 


PS00008 


583->589 


MYRISTYL 




PDOC00008 


PS00518 


70->80 


ZINC FINGER 


C3HC4 


PDOC00449 


PS00029 


436->458 


LEUCINE ZIPPER 


PDOC00029 


PS00678 


335->350 


WD REPEATS 




PDOC00574 


PS00678 


376->391 


WD REPEATS 




PDOC00574 



Pfam for DKFZphutel_li2.2 



HMM_NAME WD domain, G-beta repeats 

HMM *Mr GHnnWVWCVa FS PDGrWFI vSGSWDgTCRLWD* 

++GH ++VWC+ + G + ++SGS D+T+++WD 
Query 316 FVGHQGPVWCLCVYSMGDL-LFSGSSDKTIKVWD 348 

22.93 519 553 1 34 dkf zphutel_li2 . 2 similarity to Dictostelium myosin heavy chain 

kinase 

Alignment to HMM consensus: 



484 



WO 01/12659 



PCT/IBOO/01496 



Query *MrGHnnWVWCVaF. . SPDGrWFIvSGSWDgTCRLWD* 

++GH ++V+++A+ +PD ++S+S D+++R+W+ 
dkfzphutel 519 LTGHVGTVYALAVISTPDQTK-VFSASYDRSLRVWS 



553 



HMM_NAME Zinc finger, C3HC4 type (RING finger) 

HMM *CPICFcTFQlDyPWPFdePmMlPCgHsFCypCIrrW. .CPmC* 

C++C + F++P++++CGH+FC+ C +++ CP+ 

Query 55 CQLC CSV FKDPVITTCGHTFCRRCALKSEKCPVD 



88 



485 



WO 01/12659 

DKFZphutel_20bl9 



PCT/IB00/01496 



group: metabolism 

DKF2phutel_20bl9 encodes a novel 486 amino acid protein with similarity to bacterial sarcosine 
oxidases (EC 1.5.3.1.) 

The novel protein seems to be a novel enzyme with sarcosine oxidase activity. 

The new protein can find application in modulation of sarcosine metabolism and as a new enzyme 
for biotechnologic production processes. 

similarity to sarcosine oxidases 
membrane regions: 1 

Summary DKF2phutel_20bl9 encodes a novel 486 amino acid protein, with 
similarity to sarcosine oxidases. 



similarity to sarcosine oxidases 

complete cDNA?, complete cds potential start at Bp 48, EST hits, 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 196*7 bp 

Poly A stretch at pos. 1950, no polyadenylation signal found 



1 AGCGAGGCAG CAGTGCAGCT TTCAGAGGGT CCGGGCTCAG AGGGGTTATG 

51 ATTCGGAGGG TTCTGCCGCA CGGCATGGGC CGGGGCCTCT TGACCCGGAG 

101 GCCAGGCACG CGCAGAGGAG GCTTTTCTCT GGACTGGGAT GGAAAGGTGT 

151 CTGAGATTAA GAAGAAGATC AAGTCGATCC TGCCTGGAAG GTCCTGTGAT 

201 CTACTGCAAG ACACCAGCCA CCTGCCTCCC GAGCACTCGG ATGTGGTGAT 

251 CGTGGGAGGT GGGGTGCTTG GCTTGTCTGT GGCCTATTGG CTGAAGAAGC 

301 TGGAGAGCAG ACGAGGTGCT ATTCGAGTGC TAGTGGTGGA ACGGGACCAC 

351 ACGTATTCAC AGGCCTCCAC TGGGCTCTCA GTAGGTGGGA TTTGTCAGCA 

401 GTTCTCATTG CCTGAGAACA TCCAGCTCTC CCTCTTTTCA GCCAGCTTTC 

4 51 TACGGAACAT CAATGAGTAC CTGGCCGTAG TCGATGCTCC TCCCCTGGAC 

501 CTCCGGTTCA ACCCCTCGGG CTACCTCTTG CTGGCTTCAG AAAAGGATGC 

551 TGCAGCCATG GAGAGCAACG TGAAAGTGCA GAGGCAGGAG GGAGCCAAAG 

601 TTTCTCTGAT GTCTCCTGAT CAGCTTCGGA ACAAGTTTCC CTGGATAAAC 

651 ACAGAGGGAG TGGCTTTGGC GTCTTATGGG ATGGAGGACG AAGGTTGGTT 

701 TGACCCCTGG TGTCTGCTCC AGGGGCTTCG GCGAAAGGTC CAGTCCTTGG 

751 GAGTCCTTTT CTGCCAGGGA GAGGTGACAC GTTTTGTCTC TTCATCTCAA 

801 CGCATGTTGA CCACAGATGA CAAAGCGGTG GTCTTGAAAA GGATCCATGA 

851 AGTCCATGTG AAGATGGACC GCAGCCTGGA GTACCAGCCT GTGGAATGCG 

901 CCATTGTGAT CAACGCAGCC GGAGCCTGGT CTGCGCAAAT CGCAGCACTG 

951 GCTGGTGTTG GAGAGGGGCC GCCTGGCACC CTGCAGGGCA CCAAGCTACC 

1001 TGTGGAGCCG AGGAAAAGGT ATGTGTATGT GTGGCACTGC CCCCAGGGAC 

1051 CAGGCCTAGA GACTCCGCTT GTTGCAGACA CCAGTGGAGC CTATTTTCGC 

1101 CGGGAAGGAT TAGGTAGCAA CTACCTAGGT GGTCGTAGCC CCACTGAGCA 

1151 GGAAGAACCG GACCCGGCGA ACCTGGAAGT GGACCATGAT TTCTTCCAGG 

1201 ACAAGGTGTG GCCCCATTTG GCCCTGAGGG TCCCAGCTTT TGAGACTCTG 

1251 AAGGTTCAGA GCGCCTGGGC CGGCTATTAC GACTACAACA CCTTTGACCA 

1301 GAATGGCGTG GTGGGCCCCC ACCCGCTAGT TGTCAACATG TACTTTGCTA 

1351 CTGGCTTCAG TGGTCACGGG CTCCAGCAGG CCCCTGGCAT TGGGCGAGCT 

1401 GTAGCAGAGA TGGTACTGAA GGGCAGGTTC CAGACCATCG ACCTGAGCCC 

14 51 CTTCCTCTTT ACCCGCTTTT ACTTGGGAGA GAAGATCCAG GAGAACAACA 

1501 TCATCTGAGC ATGTGTGCTC TGCACTGGCT CCACTGGCTT GCATCCTGGC 

1551 TGTGTTCACA GCCTTGTTTG CTGCTTCCAT CTTCCCCAGT ACTGTGCCAG 

1601 GCCTTCTCCC CCTCCCCAGT GTCCTCTCCT CTCAGGCAGG CCATTGCACC 

1651 CATATGGCTG GGCAGGCACA GGCAGTGAGG CCGAGGCCAA TAGCGAGTGA 

1701 TGAGCGGGAT CCTAGGACTG ATCTGTAGCC CATGCTGATG TCACCCACCA 

1751 GGGCAATCCA TCTGGAGGCC TGAGCACCCT GGCCCAGGAC TGGCTTCATC 

1801 CTGGCACTGA CCAGGAAAGA CTGCCTCTGA CCCTCTTAGC AGACAGAGCC 

1851 CAGGCATGGG AGCACTCTGG GGCAGCCTGG CTCAGGTTTA TTGATTTTCG 

1901 TCTGTTTACC CTATCCATTA ATCAATACAT GTAATTAACT CCTTCCCTCC 

1951 AAAAAAAAAA AAAAAAA 



BLAST Results 



NO BLAST result 



486 



WO 01/12659 



PCT/IBOO/01496 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 48 bp to 1505 bp; peptide length: 486 
Category: similarity to known protein 



1 MIRRVLPHGM GRGLLTRRPG TRRGGFSLDW DGKVSEIKKK IKSILPGRSC 

51 DLLQDTSHLP PEHSDVVIVG GGVLGLSVAY WLKKLESRRG AIRVLVVERD 

101 HTYSQASTGL SVGGICQQFS LPENIQLSLF SASFLRNINE YLAVVDAPPL 

151 DLRFNPSGYL LLASEKDAAA MESNVKVQRQ EGAKVSLMSP DQLRNKFPWI 

201 NTEGVALASY GMEDEGWFDP WCLLQGLRRK VQSLGVLFCQ GEVTRFVSSS 

251 QRMLTTDDKA VVLKRIHEVH VKMDRSLEYQ PVECAIVINA AGAWSAQIAA 

301 LAGVGEGPPG TLQGTKLPVE PRKRYVYVWH CPQGPGLETP LVADTSGAYF 

351 RREGLGSNYL GGRSPTEQEE PDPANLEVDH DFFQDKVWPH LALRVPAFET 

401 LKVQSAWAGY YDYNTFDQNG VVGPHPLVVN MYFATGFSGH GLQQAPGIGR 

451 AVAEMVLKGR FQTIDLSPFL FTRFYLGEKI QENNII 



BLASTP hits 



No BLAST P hits available 



Alert BLASTP hits for DKFZphutel_20bl9, frame 3 

TREMBL : CEM0 4 B2_4 gene: M M04B2.4"; Caenorhabditis elegans cosmid M04B2, 
N - 1, Score - 801, P = 9.2e-80 

PIR:B71184 probable sarcosine oxidase - Pyrococcus horikoshii, N « 2, 
Score - 194, P = 2e-26 

PIR:B69284 sarcosine oxidase, subunit beta (soxB) homolog - 
Archaeoglobus fulgidus, N = 3, Score - 189, P « 8.2e-22 

TREMBL: AF04 27 32_1 gene: "Bb"; product: "unknown protein"; Anopheles 
gambiae (Bb) gene, partial cds; and TU37B2 (TU37B2) and diphenol 
oxidase-A2 (Dox-A2) genes, complete cds., N " 1, Score = 386, P = 
8.7e-36 



PIR:F71008 probable sarcosine oxidase - Pyrococcus horikoshii, N « 2, 
Score = 200, P - 4e-25 



>TREMBL:CEM04B2_4 gene: "M04B2.4"; Caenorhabditis elegans cosmid M04B2 
Length =527 

HSPs: 

Score - 801 (120.2 bits), Expect « 9.2e-80, P = 9.2e-80 
Identities « 171/433 (39%), Positives = 260/433 (60%) 

Query: 61 PEHSDVVIVGGGVLGLSVAYWLKKLESRRGAIRVLVVERDHTYSQASTGLSVGGICQQFS 120 

P +++VI+GGG+ G S A+WLK+ R +V+VVE + ++++ST LS GGI QQFS 
Sbjct: 91 PYRAEIVIIGGGLSGSSTAFWLKE-RFRDEDFKVVVVENNDVFTKSSTMLSTGGITQQFS 149 

Query: 121 LPENIQLSLFSASFLRNINEYLAVVDAPPLDLRFNPSGYLLLA-SEKDAAAMESNVKVQR 179 

+PE + +SLF+ FLR+ E+L ++D+ D+ F P+GYL LA ++++ M S KVQ 
Sbjct: 150 IPEFVDMSLFTTEFLRHAGEHLRILDSEQPDINFFPTGYLRLAKTDEEVEMMRSAWKVQI 209 

Query: 180 QEGAKVSLMSPDQLRNKFPWINTEGVALASYGMEDEGWFDPWCLLQGLRRKVQSLGVLFC 239 

+ GAKV L+S D+L ++P++N + V LAS G+E+EG D W LL +R K +LGV + 
Sbjct: 210 ERGAKVQLLSKDELTKRYPYMNVDDVLLASLGVENEGTIDTWQLLSAIREKNITLGVQYV 269 

Query: 240 QGEVTRFVSSSQRM LTTDDKAVVLKRIHEVHVKMDRS-LEYQPVECAIVI 288 

+GEV F R T D+ + +RI V V+ + +P+ +++ 

Sbjct: 270 KGEVEGFQFERHRASSEVHAFGDDATADENKLRAQRISGVLVRPQMNDASARPIRAHLIV 329 

Query: 289 NAAGAWSAQIAALAGVGEGPPGTLQGTKLPVEPRKRYVYVWHCPQGPGLETPLVADTS-G 347 

NAAG W+ Q+A +AG+G+G G L +P++PRKR V+V P P +P+DSG 
Sbjct: 330 NAAGPWAGQVAKMAGIGKGT-GLL-AVPVPIQPRKRDVFVIFAPDVPS-DLPFIIDPSTG 386 

Query: 348 AY FRREGLG SNYLGGRSPTEQEEP — DPANLEVDHDF FQDKVWP H L AL RV P A FET L K VQS 405 

+ R+ G +L GR+P+++E+ D +NL+VD+D F K+WP L RVP F+T KV+S 
Sbjct: 387 VFCRQTDSGQTFLVGRTPSKEEDAKRDHSNLDVDYDDFYQKIWPVLVDRVPGFQTAKVKS 446 
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Query: 406 AWAGYYDYNTFDQNGWGPHPLVVNMYFATGFSGHGLQQAPGIGRAVAEMVLKGRFQTID 465 

AW+GY D NTFD V+G HPL N++ GF G+ + RA AE + G + ++ 
Sbjct: 447 AWSGYQDINTFDDAPVIGEHPLYTNLHMMCGFGERGVMHSMAAARAYAERIFDGAYINVN 506 

Query: 466 LSPFLFTRFYLGEKIQE 4B2 

L F R + I E 
Sbjct: 507 LRKFDMRRIVKMDPITE 523 



Pedant information for DKFZphutel_20bl9, frame 3 



Report for DKFZphutel_20bl9. 3 



[LENGTH] 


486 






[MWJ 


53811.85 






[pi] 


7.66 






[HOKOL] 


TREMBL : CEM04B2_4 gene 


: ,, M04B2.4 n ; Caenorhabditis 


elegans cosmid M04B2 le-78 


(FUNG AT] 


c energy conversion 


[H. influenzae, HI0499] 8e 


-05 


[BLOCKS] 


BL00677A D-amino acid oxidases proteins 




[BLOCKS] 


BL00623A GMC oxidoreductases proteins 




[BLOCKS] 


BL01304A 






[EC] 


1.5.99.2 Dimethylglycine dehydrogenase 2e-07 




[PIRKW] 


flavoprotein 2e-07 






[PIRKW] 


oxidoreductase 2e-07 






[PROSITE] 


MYRISTYL 12 






[PROSITE] 


CK2 PHOSPHO SITE 


5 




[PROSITE] 


GLYCOSAMINOGLYCAN 


1 




[PROSITE] 


PKC PHOSPHO SITE 


6 




[KW] 


TRANSMEMBRANE 1 






[KW] 


LOW COMPLEXITY 7. 


00 % 





SEQ MIRRVLPHGMGRGLLTRRPGTRRGGFSLDWDGKVSEIKKKIKSILPGRSCDLLQDTSHLP 

SEG xxxxxxxxxxxxxxx xxxxxxxx 

PRD ccceeecccccceeecccccccccccccccccchhhhhhhhhhccccccceeeccccccc 

MEM 

SEQ PEHSDVVIVGGGVLGLSVAYWLKKLESRRGAIRVLVVERDHTYSQASTGLSVGGICQQFS 

SEG xxxxxxxxxxx 

PRD cccceeeeeccccchhhhhhhhhhhhhhcccceeeeeeccccccccccccccccceeeec 

MEM MMMMMMMMMMMMMMMMM 

SEQ LPENIQLSLFSASFLRNINEYLAWDAPPLDLRFNPSGYLLLASEKDAAAMESNVKVQRQ 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhccccceeecccceeeehhhhhhhhhhhhhhhhhh 

MEM 

SEQ EGAKVSLMSPDQLRNKFPWINTEGVALASYGMEDEGWFDPWCLLQGLRRKVQSLGVLFCQ 

SEG 

PRD cccceeecccchhhhhhccccccccccccccccccccccccchhhhhhhhhhhheeeeec 

MEM 

SEQ GEVTRFVSSSQRMLTTDDKAVVLKRIHEVHVKMDRSLEYQPVECAIVINAAGAWSAQIAA 

SEG 

PRD ceeeeecccccccccccchhhhhhhhhheeeecccccccccceeeeeeecccchhhhhhh 

MEM 

SEQ LAGVGEGPPGTLQGTKLPVEPRKRYVYVWHCPQGPGLETPLVADTSGAYFRREGLGSNYL 

SEG 

PRD hhccccccccccccccccccccceeeeeeecccccccccceeeccccceeeeccccccee 

MEM 

SEQ GGRSPTEQEEPDPANLEVDHDFFQDKVWPHLALRVPAFETLKVQSAWAGYYDYNTFDQNG 

SEG 

PRD ecccccccccccccccccccchhhhhhhhhhhhhhcchhhhhhhhhhheeeeeccccccc 

MEM 

SEQ VVGPHPLWNMYFATGFSGHGLQQAPGIGRAVAEMVLKGRFQTIDLSPFLFTRFYLGEKI 

SEG 

PRD cccccccccceeeecccccccccchhhhhhhhhhhhhhccceeeeccccccccccccccc 

MEM 

SEQ QENNII 

SEG 

PRD cccccc 

MEM 
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Prosite for DKFZphutel_20bl9 . 3 



PS00002 


438->442 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00005 


16->19 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


21->24 


PKC~PHOSPHO" 


"site 


PDOC00005 


PS00005 


87->90 


pkcTphospho" 


"site 


PDOC00005 


PS00005 


164->167 


PKC~PHOSPHO" 


"site 


PDOC00005 


PS00005 


250->253 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


400->403 


PKC~PHOSPHO~SITE 


PDOC00005 


PS00006 


120->124 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


164->168 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


255->259 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


364->368 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


366->370 


CK2~PHOSPHO SITE 


PDOC00006 


PS00008 


9->15 


MYRISTYL 




PDOC00008 


PS00008 


20->26 


MYRISTYL 




PDOC00008 


PS00008 


71->77 


MYRISTYL 




PDOC00008 


PS00008 


75->81 


MYRISTYL 




PDOC00008 


PS00008 


109->115 


MYRISTYL 




PDOC00008 


PS00008 


182->188 


MYRISTYL 




PDOC00008 


PS00008 


2Q4->210 


MYRISTYL 




PDOC00008 


PS00008 


235->241 


MYRISTYL 




PDOC00008 


PS00008 


292->298 


MYRISTYL 




PDOC00008 


PS00008 


310->316 


MYRISTYL 




PDOC00008 


PS00008 


354->360 


MYRISTYL 




PDOC00008 


PS00008 


447->453 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphutel_20bl9.3) 
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DKFZphutel_20g21 



group: signal transduction 

DKFZphutel_20g21 encodes a novel 861 amino acid protein with partial similarity to human ras 
inhibitor and other ras inhibitor proteins. 

Ras is a signal transducting molecule involved in the receptor tyrosine kinase /RAS /Map kinase 
signalling cascade. Ras proteins bind GDP/GTP and show intrinsic GTPase activity. Mutations in 
ras, which change aa 12, 13 or 61 activate the potential of ras to transform cultured cells 
and are implicated in a variety of human tumours. The novel protein seems to be a new ras 
inhibitor protein. 

The new protein can find application in modulating/blocking ras dependent signal transduction 
pathways. 



Ras inhibitor 

additional 1188 Bp at 5' and 1107 at 3' end in comparison to 122483 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 4137 bp 

Poly A stretch at pos. 4116, no polyadenylation signal found 



1 GGGAGAACTG AAACAGGAGA 
51 GCCTGGAACC CGCTGAAACC 
101 TATTCCGAGG AAGAGGACGT 
151 CAGCCTCTCC AACAGGCTCA 
201 CCATATGGCT GCAGCTGAGT 
251 CAGGCCCAGC CTCCGGGGAT 
301 GAAGAAAGTC CTCTCCCTCC 
351 AGGAATTTGC CATAAAGGAA 
401 GGAATCAGTT TCGCAGATTT 
451 CAGGGATGTT CTACCATTTA 
501 CCAAGTCGGA GGCTCAGCTT 
551 TGGAGCTCCC CAGCTGACAG 
601 GCCTCTTTCC TCCGACGGTG 
651 TTATAAATGG AGTGCATTCT 
701 TGCAGCCAGA CCAACGGGGC 
751 AGTGCACAGC CAGGACCTCA 
801 CTCCCAACGC GAATGGCACG 
851 CCGCCACCCG CTATTAATAG 
901 TGAAACCCAG ACGAGCATGC 
951 ACGTAGCTCT GCCTGGAACG 
1001 AAGAAGCAGG CTTCTTTTCT 
1051 CGGCGGCCGG CCGGGCGCAG 
1101 GCCCAGGTGG GGCCCCGCCT 
1151 CCGCCGCCCA GCTCTGAATC 
1201 GCTGAGCGAC ATGAGCATTT 
1251 ACCGGAGCAT GCCTCTGTTT 
1301 GAGGACTACG AGGGGGAAAG 
1351 GTCCAAAAAG AAAAGGAGCA 
1401 CCCAGCTGCA GAAGGTGAGC 
14 51 AAGCGGATGG TCCGCAGGAT 
1501 CTTCGGGTGC TTAGTGCAGG 
1551 AGTGCCACGT GTCCAGCACC 
1601 ACCCAGGTCA AGAACTATTT 
1651 CGAGTCGCTG ATCCCTGAAG 
1701 TGCACAAGTG CATCTTGAAG 
1751 AAGGACTTTC ACATGGCCGA 
1801 GCAGCTTGTG CGGCAGAGGA 
1851 CCCCTGATTT TGTGGATGTG 
1901 CAGAAGATGT ATTCGCCGGA 
1951 CAAGCTCATT TACACGGTCA 
2001 CTGATGACTT CTTGCCAGTC 
2051 CTTGAATTGG ACACTGAAAT 
2101 GCTGTTACAT GGAGAAGGAG 
2151 TTTCTCTGAT AAAGAATTTC 
2201 TCAGAAACCA GAGACACCCT 
2251 CCGGACCATC CCCTCTGTGG 
2301 TTCAGGAGGT CAACAGTGGT 
2351 TACATCACCA CTGAGGATGT 
2401 GGGGGACCCT GAGGAGTACA 
2451 AGCAGCTGGC AGAGGACACT 



TGGTGCGGAC AGATGTCAAC CTGGAAAATG 
CACAGCATGG TAAGACACAA GGATGGTGGC 
GAAGACCTGT GCCCGGGACT CAGGCTATGA 
GCATCTTGGA CCGGCTCCTC CACACCCACC 
CTGAGTGAGG AGGAGGCAGC AGAGGTCCTG 
CTTCCTGGTT CATAAATCTA CCAAGATGCA 
GCCTGCCCTG TGAATTTGGG GCCCCACTCA 
AGCACATACA CCTTTTCCCT GGAAGGCTCA 
ATTCCGGCTC ATTGCTTTCT ACTGCATCAG 
CCTTGAAGTT GCCTTATGCC ATTTCAACAG 
GAAGAACTGG CCCAGATGGG ACTAAATTTC 
CAAACCCCCG AACCTTCCAC CTCCCCATAG 
TCTGTCCTGC CTCCCTGCGT CAGCTCTGCC 
ATCAAAACCA GGACGCCTTC AGAGCTGGAG 
CCTGTGCTTT ATTAATCCCC TTTTCTTGAA 
GTGGAGGCCT GAAACGGCCG AGCACAAGGA 
GAGCGGACTC GGTCCCCCCC ACCCAGGCCC 
TCTCCACACA AGCCCTCGGC TGGCCAGGAC 
CAGAAACAGT CAACCATAAC AAACATGGGA 
AAACCAACTC CCATCCCTCC ACCCCGGCTG 
GGAAGCAGAG GGCGGTGCAA AGACCTTGAG 
GCCCGGAGCT GGAGCTGGGC ACAGCTGGCA 
GAGGCCGCCC CGGGGGATTG CACAAGGGCC 
ACGGCCCCCG TGCCATGGAG GCCGGCAGCG 
CTACTTCCTC CTCCGACTCG CTGGAGTTCG 
GGCTACGAGG CGGACACCAA CAGCAGCCTG 
TGACCAAGAG ACCATGGCGC CCCCCATCAA 
GCTCCTTCGT GCTGCCCAAG CTCGTCAAGT 
GGGGTGTTCA GCTCCTTCAT GACCCCGGAG 
CGCCGAGCTT TCCCGGGACA AATGCACCTA 
ACTACGTGAG CTTCCTGCAG GAGAACAAGG 
GACATGCTGC AGACCATCCG GCAGTTCATG 
GTCTCAGAGC TCGGAGCTGG ACCCCCCCAT 
ACCAAATAGA TGTGGTGCTG GAAAAAGCCA 
CCCCTCAAGG GGCATGTGGA GGCCATGCTG 
TGGCTCATGG AAGCAACTCA AGGAGAACCT 
ATCCGCAGGA GCTGGGGGTC TTCGCCCCGA 
GAGAAAATCA AAGTCAAGTT CATGACCATG 
AAAGAAGGTC ATGCTGCTGC TGCGGGTCTG 
TGGAGAACAA CTCAGGGAGG ATGTATGGCG 
CTGACCTATG TCATAGCCCA GTGTGACATG 
CGAGTACATG ATGGAGCTCC TAGACCCATC 
GCTATTACTT GACAAGCGCA TATGGAGCAC 
CAAGAAGAAC AAGCAGCGCG ACTGCTCAGC 
GAGGCAGTGG CACAAACGGA GAACCACCAA 
ACGACTTCCA GAATTACCTC CGAGTTGCAT 
TGCACAGGAA AGACCCTCCT TGTGAGACCT 
GTGTCAGATC TGCGCTGAGA AGTTCAAGGT 
GCCTCTTTCT CTTCGTTGAC GAGACATGGC 
TACCCTCAAA AAATCAAGGC GGAGCTGCAC 
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2501 AGCCGACCAC AGCCCCACAT 

2551 CGATCCTTAT GGCATCATTT 

2601 CCTAGAAGAC AGGCGGGACT 

2651 AGCCTTGCCT TCCCGCTTCT 

2701 CTCGGGGACC CCTCAGTGTA 

2751 CAAGGGCAAC TTTAGCCACG 

2801 ATTCTCTTTT GGCAATGGAG 

2851 ATTGTTTGCT ACCTACCCCC 

2901 TATATGTGCA GAAGAAACAC 

2951 CAGATGCTTG CGATGCAGTG 

3001 TTCATCCCTG CCTTCCTTCC 

3051 TTTTTACAAA GAGCCTTCAT 

3101 GCAGTTGCAG GTAAACTGTC 

3151 TAAAATATTC TATAATTATG 

3201 TAAATCTCTT GCTGGATTTG 

3251 GTAACTGGAT GTTTTGGCAA 

3301 AAGCAACGTA TTCCTGACAC 

3351 TACTGTTCTC TTGTTCACGT 

3401 ACAAATGATG CTGAGAATAA 

3451 AGAGAAATAT GAACTCTAAC 

3501 AGGCTCTTCA AAAGATGTAG 

3551 AAAATACTGT AAATATGCAG 

3601 ATTTGCTTGT AGAAACAATT 

3651 AGAAGAACAC TTTTCTCCCT 

3701 AAATTATTGG GACCAGAAAC 

3751 TTAAATAAGA TGCTATATAA 

3801 TCAATCTACA TTATCAGAAC 

3851 AACCAGTTTG CAGGTGCACA 

3901 AGGTAGTTAC AAAAACATGT 

3951 TCATTTGGTT GGCTTTGTAC 

4001 GAACTAGAAC CCTCAGCACA 

4051 TAAATGGAAT TTTGCACATA 

4101 GTGAAAATAA TTTTTGAAAT 



CTTCCACTTT GTCTACAAAC GCATCAAGAA 

TCCAGAACGG GGAAGAAGAC CTCACCACCT 

TCCCAGTGGT GCATCCAAAG GGGAGCTGGA 

ACATGCTTGA GCTTGAAAAG CAGTCACCTC 

GTGACTAAGC CATCCACAGG CCAACTCGGC 

CAAGGTAGCT GAGGTTTGTG AAACAGTAGG 

AATTGCATCT GATGGTTCAA GTGTCCTGAG 

AGTCAGGTTC TAGGTTGGCT TACAGGTATG 

TTAAGATACA AGTTCTTTTG AATTCAACAG 

CGTCAGGTGA TTCTCACTCC TGTGGATGGC 

TTTCTTTTTC CTTTTTTTTT TTTTTTTTTT 

GTTTTTATAT ATTTCATAGA AATTTTTATA 

AGGATTGGTT TTAAAATATT TTTGTAACTT 

CATGTGATTT TAACATTTAA TATTCAAAAA 

AGAGTATTGC ATTTTTAAAG TCTCTCTTCT 

CTTTGTGGGG AGAGACTGCT GGATTTCTTA 

TGGCCACAGA ATGCCTTTGG AAATCGGATG 

TTAGTGGTGT TTTGCTGTTT TGTTTTTTAA 

GGAGAGAAAT GAATGTAGAG AGAGGTAGAG 

AAAGGACTGA GGAGTGCAGT CTGCTGGTTC 

AAAAAGAGAT AGAAGGAACC ACCTATGCTT 

TGAGGTTTGG CAAAATCTAT TCCATGTGTG 

TTGAAAGCCC CTTGAGGAAA ATAAAAATCA 

TTTCCATACA AATTAAAACT TAACAGCATC 

CAAGTAATGT ATAATGTGGC TTTTGTTGAG 

TGGAGAAGAA TTTGAAAATG CACAAAAAAA 

CTGCAGTGAA ATTAAACTTA TGTTAAATAA 

AACTATGAGG GTCTTGTATC CACGTAACAC 

TATTGTACTG TGTAAAGATG CATAGTCATC 

CTTGTACCTT TTTTAGCCTT GGCTTTTGTT 

TACTGTGTTG TACTTTTGTA AATGATTTTT 

ATACATTGTA ATACTGTATG ATAATCATGT 
AAAAAAAAAA AAAAAAA 



BLAST Results 



Entry 122483 from database EMBL: 
Sequence 15 from patent US 5527896. 
Length - 1829 
Plus Strand HSPs: 

Score = 9097 (1364.9 bits), Expect = 0.0, P = 0.0 
Identities = 1821/1823 (99%), Positives - 1821/1823 (99%), 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 20 bp to 2602 bp; peptide length: 861 
Category: known protein 

Classification: Cell signaling/communication 



1 MVRTDVNLEN GLEPAETHSM VRHKDGGYSE EEDVKTCARD SGYDSLSNRL 

51 SILDRLLHTH PIWLQLSLSE EEAAEVLQAQ PPGIFLVHKS TKMQKKVLSL 

101 RLPCEFGAPL KEFAIKESTY TFSLEGSGIS FADLFRLIAF YCISRDVLPF 

151 TLKLPYAIST AKSEAQLEEL AQMGLNFWSS PADSKPPNLP PPHRPLSSDG 

201 VCPASLRQLC LINGVHSIKT RTPSELECSQ TNGALCFINP LFLKVHSQDL 

251 SGGLKRPSTR TPNANGTERT RSPPPRPPPP AINSLHTSPR LARTETQTSM 

301 PETVNHNKHG NVALPGTKPT PIPPPRLKKQ ASFLEAEGGA KTLSGGRPGA 

351 GPELELGTAG SPGGAPPEAA PGDCTRAPPP SSESRPPCHG GRQRLSDMSI 

401 STSSSDSLEF DRSMPLFGYE ADTNSSLEDY EGESDQETMA PPIKSKKKRS 

451 SSFVLPKLVK SQLQKVSGVF SSFMTPEKRM VRRIAELSRD KCTYFGCLVQ 

501 DYVSFLQENK ECHVSSTDML QTIRQFMTQV KNYLSQSSEL DPPIESLIPE 

551 DQIDVVLEKA MHKCILKPLK GHVEAMLKDF HMADGSWKQL KENLQLVRQR 

601 NPQELGVFAP TPDFVDVEKI KVKFMTMQKM YSPEKKVMLL LRVCKLIYTV 

651 MENNSGRMYG ADDFLPVLTY VIAQCDMLEL DTEIEYMMEL LDPSLLHGEG 

701 GYYLTSAYGA LSLIKNFQEE QAARLLSSET RDTLRQWHKR RTTNRTIPSV 

751 DDFQNYLRVA FQEVNSGCTG KTLLVRPYIT TEDVCQICAE KFKVGDPEEY 

801 SLFLFVDETW QQLAEDTYPQ KIKAELHSRP QPHIFHFVYK RIKNDPYGII 
851 FQNGEEDLTT S 
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BLAST P hits 

No BLAST P hits available 

Alert BLASTP hits for DKFZphutel_20g21 , frame 2 

TREMBL:RNU80076_1 product: "RINl"; Rattus norvegicus RINl mRNA, 
complete cds., N =» 3, Score = 606, P = 6.8e-97 

PIR:A38637 Ras interactor RINl - human, N = 3, Score = 587, p » 1.9e-92 

TREMBL : HSRASINL_1 product: "ras inhibitor"; Human ras inhibitor mRNA, 
3* end., N » 2, Score = 592, P - 9.8e-61 

SWISSPROT:RINl_HUMAN RAS INTERACTION/ INTERFERENCE PROTEIN 1 (RAS 
INHIBITOR JC99) { FRAGMENT) . , N - 2, Score » 587, P « 4.1e-60 

PIR:B38637 Ras inhibitor (clone JC265) - human (fragment), N = 1, Score 
- 2446, P = 4.6e-254 



>PIR:B38637 Ras inhibitor (clone JC265) - human (fragment) 
Length « 471 

HSPs: 

Score - 2446 (367.0 bits), Expect = 4.6e-254, P » 4.6e-254 
Identities = 471/471 (100%), Positives - 471/471 (100%) 

Query: 391 GRQRLSDMSISTSSSDSLEFDRSMPLFGYEADTNSSLEDYEGESDQETMAPPIKSKKKRS 450 

GRQRLSDMSISTSSSDSLEFDRSMPLFGYEADTNSSLEDYEGESDQETMAPPIKSKKKRS 
Sbjct: 1 GRQRLSDMSISTSSSDSLEFDRSMPLFGYEADTNSSLEDYEGESDQETMAPPIKSKKKRS 60 

Query: 451 S S FV L PKL VKS QLQK V SGV FS S FMT PE KRMVRRI AEL S RDKCT Y FGC L VQ D Y V S F LQE N K 510 

SSFVLPKLVKSQLQKVSGVFSSFMTPEKRMVRRIAELSRDKCTYFGCLVQDYVSFLQENK 
Sbjct: 61 SSFVLPKLVKSQLQKVSGVFSSFMTPEKRMVRRIAELSRDKCTYFGCLVQDYVSFLQENK 120 

Query: 511 ECHVSSTDMLQTIRQFMTQVKNYLSQSSELDPPIESLIPEDQIDVVLEKAMHKCILKPLK 570 

ECHVSSTDMLQTI RQFMTQVKNYLSQSSELDPPIESLI PEDQI DWLEKAMHKCI LKPLK 
Sbjct: 121 ECHVSSTDMLQTI RQFMTQVKNYLSQSSELDPPIESLI PEDQI DWLEKAMHKCI LKPLK 180 

Query: 571 GHVEAMLKDFHMADGSWKQLKENLQLVRQRNPQELGVFAPTPDFVDVEKIKVKFMTMQKM 630 

GHVEAMLKDFHMADGSWKQLKENLQLVRQRNPQELGVFAPTPDFVDVEKIKVKFMTMQKM 
Sbjct: 181 GHVEAMLKDFHMADGSWKQLKENLQLVRQRNPQELGVFAPTPDFVDVEKIKVKFMTMQKM 240 

Query: 631 YSPEKKVMLLLRVCKLIYTVMENNSGRMYGADDFLPVLTYVIAQCDMLELDTEIEYMMEL 690 

YSPEKKVMLLLRVCKLIYTVMENNSGRMYGADDFLPVLTYVIAQCDMLELDTEIEYMMEL 
Sbjct: 241 YSPEKKVMLLLRVCKLIYTVMENNSGRMYGADDFLPVLTYVIAQCDMLELDTEIEYMMEL 300 

Query: 691 LDPSLLHGEGGYYLTSAYGALSLIKNFQEEQAARLLSSETRDTLRQWHKRRTTNRTIPSV 750 

LDPSLLHGEGGYYLTSAYGALSLIKNFQEEQAARLLSSETRDTLRQWHKRRTTNRTIPSV 
Sbjct: 301 LDPSLLHGEGGYYLTSAYGALSLIKNFQEEQAARLLSSETRDTLRQWHKRRTTNRTIPSV 360 

Query: 751 DDFQNYLRVAFQEVNSGCTGKTLLVRPYITTEDVCQICAEKFKVGDPEEYSLFLFVDETW 810 

DDFQNYLRVAFQEVNSGCTGKTLLVRPYITTEDVCQICAEKFKVGDPEEYSLFLFVDETW 
Sbjct: 361 DDFQNYLRVAFQEVNSGCTGKTLLVRPYITTEDVCQICAEKFKVGDPEEYSLFLFVDETW 420 

Query: 811 QQLAEDT YPQKIKAELHSRPQPHI FHFVYKRIKNDP YGI I FQNGEEDLTTS 861 

* QQLAEDT YPQKIKAELHSRPQPHI FHFVYKRIKNDP YGI I FQNGEEDLTTS 
Sbjct: 421 QQLAEDT YPQKIKAELHSRPQPHI FHFVYKRIKNDP YGI I FQNGEEDLTTS 471 



Pedant information for DKFZphutel_20g21, frame 2 



Report for DKFZphutel_20g21 . 2 



861 

96380.26 
6.15 

PIR:B38637 Ras inhibitor (clone JC265) - human (fragment) 0.0 

08.13 vacuolar transport [S. cerevisiae, YML097cl 3e-10 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YML097c] 

30.03 organization of cytoplasm IS. cerevisiae, YML097cl 3e-10 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YML097c] 

alternative splicing 3e-59 
Ras interactor RINl 3e-59 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 
[ FUNCAT ] 
[ FUNCAT ) 
3e-10 
[ FUNCAT ] 
( FUNCAT ] 
3e-10 
[PIRKWJ 
(SUPFAM] 
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[KWJ All_Alpha 

(KWJ LOW_COMPLEXITY 11.27 % 

SEQ MVRTDVNLENGLEPAETHSMVRHKDGGYSEEEDVKTCARDSGYDSLSNRLSILDRLLHTH 

SEG 

PRD ccccceeeccccccccceeeeeecccccccccceeeeeeccccccchhhhhhhhhhhhhh 

SEQ PIWLQLSLSEEEAAEVLQAQPPGIFLVHKSTKMQKKVLSLRLPCEFGAPLKEFAIKESTY 

SEG . . . xxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhccccceeeeechhhhhhhhhhhcccccccccceeeeeeecc 

SEQ TFSLEGSGISFADLFRLIAFYCISRDVLPFTLKLPYAISTAKSEAQLEELAQMGLNFWSS 

SEG 

PRD ceeecccccchhhhhhhhhhhhhcceeeeeecccchhhhhhhhhhhhhhhhhhccccccc 

SEQ PADSKPPNLPPPHRPLSSDGVCPASLRQLCLINGVHSIKTRTPSELECSQTNGALCFINP 

SEG xxxxxxxxxx 

PRD cccccccccccccccccccccccchhhhhhcccccccccccccccccccccccceeeecc 

SEQ LFLKVHSQDLSGGLKRPSTRTPNANGTERTRSPPPRPPPPAINSLHTSPRLARTETQTSM 

SEG xxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PETVNHNKHGNVALPGTKPTPIPPPRLKKQASFLEAEGGAKTLSGGRPGAGPELELGTAG 

SEG xxxxxxxxxxx xx 

PRD eeeeeccccccccccccccccccccchhhhhhhhhhhccccccccccccccceeeeeccc 

SEQ SPGGAPPEAAPGDCTRAPPPSSESRPPCHGGRQRLSDMSISTSSSDSLEFDRSMPLFGYE 

SEG xxxxxxxxxxxx xxxxxxxxxx xxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccccceeeccccccceee 

SEQ ADTNSSLEDYEGESDQETMAPPIKSKKKRSSSFVLPKLVKSQLQKVSGVFSSFMTPEKRM 

SEG xxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhcchhhh 

SEQ VRRIAELSRDKCTYFGCLVQDYVSFLQENKECHVSSTDMLQTIRQFMTQVKNYLSQSSEL 

SEG 

PRD hhhhhhhhhhchhhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhhhhhhcc 

SEQ DPPIESLIPEDQIDVVLEKAMHKCILKPLKGHVEAMLKDFHMADGSWKQLKENLQLVRQR 

SEG 

PRD ccccccccccchhhhhhhhhhhhhccccchhhhhhhhhhhhhccccchhhhhhhhhhhhh 

SEQ NPQELGVFAPTPDFVDVEKIKVKFMTMQKMYSPEKKVMLLLRVCKLI YTVMENNSGRMYG 

SEG 

PRD ccccccccccccccchhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhcccccc 

SEQ ADDFLPVLTYVIAQCDMLELDTEIEYMMELLDPSLLHGEGGY YLTSAYGALSLI KNFQEE 

SEG 

PRD cccccccceeecccccchhhhhhhhhhhhhhcccccccccceeeeehhhhhhhhhhhhhh 

SEQ QAARLLSSETRDTLRQWHKRRTTNRTIPSVDDFQNYLRVAFQEVNSGCTGKTLLVRPYIT 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhccccccceeeeecccccc 

SEQ TEDVCQICAEKFKVGDPEEYSLFLFV DETWQQL AE DTYPQKIKAELHSRPQPHIFHFVYK 

SEG 

PRD chhhhhhhhhheeecccccceeeeehhhhhhcccccccchhhhhhhhhccccceeeehhh 

SEQ RI KNDPYGI I FQNGEEDLTTS 

SEG 

PRD hhccccceeeeeccccccccc 

(No Prosite data available for DKFZphutel_20g21 . 2) 
(No Pfam data available for DKF2phutel_20g21 .2) 
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DKF2phutel_20hl3 



group: intracellular transport and trafficking 

DKFZphutel_20hl3 encodes a novel 955 amino acid protein with similarity to alpha-adaptins . 

Adaptins are components of the adaptor complexes which link clathrin to receptors in coated 
vesicles. The alpha-adaptins, which are found exclusively, in endocytic coated vesicles, 
separate into two bands on SDS gels, designated A and C. The novel protein is very similar to 
both alpha adaptin A and C. The novel protein is a new human alpha-adaptin . 

The new protein can find application in modulating endocytosis and vesicle trafficking in 
cells . 



strong similarity to alpha-adaptins 

complete cDNA, complete cds start at Bp 78, EST hits 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 3352 bp 

Poly A stretch at pos. 3297, polyadenylation signal at pos. 3279 



1 GCGCCCGGTC CCCGCTTGCC 
51 AGGCCTGGAG CCGACACCAC 
101 TGGGATGCGG GGGCTCGCGG 
151 GCAAAGAGGC GGAAATTAAG 
201 TCCAAGTTCA AAGGAGACAA 
251 TGTGTGTAAA CTGCTTTTCA 
301 GGCACATGGA GGCTGTGAAT 
351 CAAATAGGTT ACCTGTTCAT 
401 GATCCGCCTC ATCAACAACG 
451 CCACCTTCAT GTGCCTGGCC 
501 GAGATGGGCG AGGCCTTTGC 
551 GGACAGCATG GACAGTGTCA 
601 TGTACAAGGC CTCGCCTGAC 
651 GTGGTACACC TGCTCAATGA 
701 CAGCCTCATC ACCTGTCTCT 
751 GCGTCTCTCT GGCTGTGTCG 
801 ACCGACCTCC AGGACTACAC 
851 GGTGAAGCTC CTGCGGCTGC 
901 CTGTGAAGGG GCGGCTGGTG 
951 CAGGAGCCCC CCAAATCCAA 
1001 CATCCTCTTC GAGACCATCA 
1051 ACCTCCTGGT TCGGGCCTGC 
1101 GAGACCAACC TGCGCTACCT 
1151 CTCCGAGTTC TCCCATGAAG 
1201 ATGCCCTCAA GACGGAGCGG 
1251 CTCCTCTACG CCATGTGTGA 
1301 GATGCTGCGG TACCTGGAGA 
1351 TCCTGAAGGT GGCCATCCTG 
1401 TACGTGGACA CCATCCTCAA 
1451 TGAGGAGGTG TGGTACCGTG 
1501 TCCAGGGCTA TGCCGCCAAG 
1551 TGTCACGAGA ACATGGTGAA 
1601 GAACCTGATT GCTGGGGACC 
1651 TGCTCCACTC CAAGTTCCAT 
1701 CTGTCCACCT ACATCAAGTT 
1751 CATCCAGGGC GTCCTGCGGG 
1801 AGCTGCAGCA GCGAGCCGTG 
1851 ACCGACGTCC TGGCCACGGT 
1901 CGAGTCGTCC ATCCTGGCCA 
1951 GCAGCGCCCT GGACGATGGC 
2001 GGGGGCATGG AGCCCACCCC 
2051 CGACCTCCTG GGGCTGCGGG 
2101 CTGCAGGAGC AGGGAACCTT 
2151 CAGCCCAGCC TGGGGCCCAC 
2201 TGAGGACATC GGCCCTCCCA 
2251 TTGTGTGTAA GAACAACGGG 
2301 GGAGTCAAGT CAGAGTTCCG 
2351 TGGCAACAAG ACCTCGGTGC 
2401 ACCCGGGAGA CCTCCAGACT 
2451 GCGCAGGTGG ACGGCGGCGC 
2501 CCTGCGGGAC TTCCTGACGC 
2551 GTGGCGCCCC CCAGGCCCTC 



AGCCCCCGCT GCTCTGTGCC CTGTCCGGCC 
CGCCATCATG CCGGCCGTGT CCAAGGGCGA 
TGTTCATCTC CGACATCCGG AACTGTAAGA 
AGAATCAACA AGGAACTGGC CAACATCCGC 
AGCCTTGGAT GGCTACAGTA AGAAAAAATA 
TCTTCCTGCT TGGCCATGAC ATTGACTTTG 
CTGTTGAGTT CCAATAAATA CACAGAGAAG 
TTCTGTGCTG GTGAACTCGA ACTCGGAGCT 
CCATCAAGAA TGACCTGGCC AGCCGCAACC 
CTGCACTGCA TCGCCAACGT GGGCAGCCGG 
CGCTGACATC CCCCGCATCC TGGTGGCCGG 
AGCAGAGTGC GGCCCTGTGC CTCCTTCGAC 
CTGGTGCCCA TGGGCGAGTG GACGGCGCGT 
CCAGCACATG GGTGTGGTCA CGGCCGCCGT 
GCAAGAAGAA C CCA GAT G AC TTCAAGACGT 
CGCCTGAGCC GGATCGTCTC CTCTGCCTCC 
CTACTACTTC GTCCCAGCAC CCTGGCTCTC 
TGCAGTGCTA CCCGCCTCCA GAGGATGCGG 
GAATGTCTGG AGACTGTGCT CAACAAGGCC 
GAAGGTGCAG CATTCCAACG CCAAGAACGC 
GCCTCATCAT CCACTATGAC AGTGAGCCCA 
AACCAGCTGG GCCAGTTCCT GCAGCACCGG 
GGCCCTGGAG AGCATGTGCA CGCTGGCCAG 
CCGTCAAGAC GCACATTGAC ACCGTCATCA 
GACGTCAGCG TGCGGCAGCG GGCGGCTGAC 
CCGGAGCAAT GCCAAGCAGA TCGTGTCGGA 
CGGCAGACTA CGCCATCCGC GAGGAGATCG 
GCCGAGAAGT ACGCCGTGGA CTACAGCTGG 
CCTCATCCGC ATTGCGGGCG ACTACGTGAG 
TGCTACAGAT CGTCACCAAC CGTGATGACG 
ACCGTCTTTG AGGCGCTCCA GGCCCCTGCC 
GGTTGGCGGC TACATCCTTG GGGAGTTTGG 
CCCGCTCCAG CCCCCCAGTG CAGTTCTCCC 
CTGTGCAGCG TGGCCACGCG GGCGCTGCTG 
CATCAACCTC TTCCCCGAGA CCAAGGCCAC 
CCGGCTCCCA GCTGCGCAAT GCTGACGTGG 
GAGTACCTCA CCCTCAGCTC AGTGGCCAGC 
GCTGGAGGAG ATGCCGCCCT TCCCCGAGCG 
AGCTGAAACG CAAGAAGGGG CCAGGGGCCG 
CGGAGGGACC CCAGCAGCAA CGACATCAAC 
CAGCACTGTG TCGACGCCCT CGCCCTCCGC 
CAGCCCCTCC CCCGGCAGCA CCCCCGGCTT 
CTGGTGGACG TCTTCGATGG CCCGGCCGCC 
CCCCGAGGAG GCCTTCCTCA GCCCAGGTCC 
TTCCGGAAGC CGATGAGTTG CTGAATAAGT 
GTCCTGTTCG AGAACCAGCT GCTGCAGATC 
ACAGAACCTG GGCCGCATGT ATCTCTTCTA 
AGTTCCAGAA TTTCTCACCC ACTGTGGTTC 
CAGCTGGCTG TGCAGACCAA GCGCGTGGCG 
GCAGGTGCAG CAGGTGCTCA ATATCGAGTG 
CCCCGCTGCT GTCCGTGCGC TTCCGGTACG 
ACCCTGAAGC TCCCAGTGAC CATCAACAAG 
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2601 TTCTTCCAGC CCACCGAGAT GGCGGCCCAG GATTTCTTCC AGCGCTGGAA 

2651 GCAGCTGAGC CTCCCTCAAC AGGAGGCGCA GAAAATCTTC AAAGCCAACC 

2701 ACCCCATGGA CGCAGAAGTT ACTAAGGCCA AGCTTCTGGG GTTTGGCTCT 

2751 GCTCTCCTGG ACAATGTGGA CCCCAACCCT GAGAACTTCG TGGGGGCGGG 

2801 GATCATCCAG ACTAAAGCCC TGCAGGTGGG CTGTCTGCTT CGGCTGGAGC 

2851 CCAATGCCCA GGCCCAGATG TACCGGCTGA CCCTGCGCAC CAGCAAGGAG 

2901 CCCGTCTCCC GTCACCTGTG TGAGCTGCTG GCACAGCAGT TCTGAGCCCT 

2951 GGACTCTGCC CCGGGGGATG TGGCCGGCAC TGGGCAGCCC CTTGGACTGA 

3001 GGCAGTTTTG GTGGATGGGG GACCTCCACT GGTGACAGAG AAGACACCAG 

3051 GGTTTGGGGG ATGCCTGGGA CTTTCCTCCG GCCTTTTGTA TTTTTATTTT 

3101 TGTTCATCTG CTGCTGTTTA CATTCTGGGG GGTTAGGGGG AGTCCCCCTC 

3151 CCTCCCTTTC CCCCCCAAGC ACAGAGGGGA GAGGGGCCAG GGAAGTGGAT 

3201 GTCTCCTCCC CTCCCACCCC ACCCTGTTGT AGCCCCTCCT ACCCCCTCCC 

3251 CATCCAGGGG CTGTGTATTA TTGTGAGCGA ATAAACAGAG AGACGCTAAA 

3301 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

3351 AA 



BLAST Results 



No BLAST result 



Medline entries 



89155572: 

Cloning of cDNAs encoding two related 100-kD coated vesicle proteins 
(alpha-adaptins) . 

97431776: 

Alpha-adaptin, a marker for endocytosis, is expressed in complex 
patterns during Drosophila 
development. 



Peptide information for frame 3 



ORF from 78 bp to 2942 bp; peptide length: 955 
Category: strong similarity to known protein 



1 MPAVSKGDGM RGLAVFISDI 

51 DGYSKKKYVC KLLFIFLLGH 

101 LVNSNSELIR LINNAIKNDL 

151 IPRILVAGDS MDSVKQSAAL 

201 MGVVTAAVSL ITCLCKKNPD 

251 FVPAPWLSVK LLRLLQCYPP 

301 QHSNAKNAIL FETISLIIHY 

351 ESMCTLASSE FSHEAVKTHI 

401 NAKQIVSEML RYLETADYAI 

451 RIAGDYVSEE VWYRVLQIVT 

501 GYILGEFGNL IAGDPRSSPP 

551 LFPETKATIQ GVLRAGSQLR 

601 EMPPFPERES SILAKLKRKK 

651 VSTPSPSADL LGLRAAPPPA 

701 EAFLSPGPED IGPPIPEADE 

751 LGRMYLFYGN KTSVQFQNFS 

801 QQVLNIECLR DFLTPPLLSV 

851 QDFFQRWKQL SLPQQEAQKI 

901 PENFVGAGII QTKALQVGCL 

951 LAQQF 



RNCKSKEAEI KRINKELANI RSKFKGDKAL 
DIDFGHMEAV NLLSSNKYTE KQIGYLFISV 
ASRNPTFMCL ALHCIANVGS REMGEAFAAD 
CLLRLYKASP DLVPMGEWTA RWHLLNDQH 
DFKTCVSLAV SRLSRIVSSA STDLQDYTYY 
PEDAAVKGRL VECLETVLNK AQEPPKSKKV 
DSEPNLLVRA CNQLGQFLQH RETNLRYLAL 
DTVINALKTE RDVSVRQRAA DLLYAMCDRS 
REEIVLKVAI LAEKYAVDYS WYVDTILNLI 
NRDDVQGYAA KTVFEALQAP ACHENMVKVG 
VQFSLLHSKF HLCSVATRAL LLSTYIKFIN 
NADVELQQRA VEYLTLSSVA STDVLATVLE 
GPGAGSALDD GRRDPSSNDI NGGMEPTPST 
APPASAGAGN LLVDVFDGPA AQPSLGPTPE 
LLNKFVCKNN GVLFENQLLQ IGVKSEFRQN 
PTVVHPGDLQ TQLAVQTKRV AAQVDGGAQV 
RFRYGGAPQA LTLKLPVTIN KFFQPTEMAA 
FKANHPMDAE VTKAKLLGFG SALLDNVDPN 
LRLEPNAQAQ MYRLTLRTSK EPVSRHLCEL 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel_20hl3, frame 3 

PIR:B30111 alpha-adaptin C - mouse, N = 1, Score - 3990, P - 0 

PIR:S11276 alpha-adaptin c - rat, N ■= 1, Score - 3987, P = 0 

SWISSPROT:ADAC_RAT ALPHA-ADAPTIN C (CLATHRIN ASSEMBLY PROTEIN COMPLEX 2 
ALPHA-C LARGE CHAIN) (100 KD COATED VESICLE PROTEIN C) (PLASMA MEMBRANE 
ADAPTOR HA2/AP2 ADAPTIN ALPHA C SUBUNIT)., N = 1, Score - 3982, P - 0 
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SWISSPROT:ADAC_MOUSE ALPHA- ADAPT IN C (CLATHRIN ASSEMBLY PROTEIN COMPLEX 
2 ALPHA-C LARGE CHAIN) (100 KD COATED VESICLE PROTEIN C) ( PLASMA 
MEMBRANE ADAPTOR HA2/AP2 ADAPT IN ALPHA C SUBUNIT)., N » 1, Score = 
3976, P = 0 

TREMBL:AB020706_I gene: "KIAA0899"; product: "KIAA0899 protein"; Homo 
sapiens mRNA for KIAA0899 protein, partial cds., N = 1, Score - 3932, P 
= 0 



>PIR:B30111 alpha-adaptin C - mouse 
Length = 93B 

HSPs: 

Score = 3990 (598.6 bits), Expect » 0.0e+00, P - 0.0e+00 
Identities - 787/955 (82%), Positives • 858/955 (89%) 

Query: 1 MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 60 

MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 
Sbjct: 1 MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 60 

Query: 61 KLLFI FLLGHDI DFGHMEAVNLLSSNKYTEKQIGYLFI S VLVNSNSELIRLINNAI KNDL 120 

KLLFIFLLGHDIDFGHMEAVNLLSSN+YTEKQIGYLFISVLVNSNSELIRLINNAIKNDL 
Sbjct: 61 KLLFI FLLGHDIDFGHMEAVNLLSSNRYTEKQIGYLFISVLVNSNSELIRLINNAIKNDL 120 

Query: 121 ASRNPTFMCLALHCIANVGSREMGEAFAADIPRILVAGDSMDSVKQSAALCLLRLYKASP 180 

ASRNPTFM LALHC I ANVGS REM EAFA + 1 P+ 1 LV AG D+MD S V KQS AALC L L RL Y + SP 
Sbjct: 121 ASRNPTFMGLALHCIANVGSREMAEAFAGEIPKILVAGDTMDSVKQSAALCLLRLYRTSP 180 

Query: 181 DLVPMGEWTARVVHLLNDQHMGWTAAVSLITCLCKKNPDDFKTCVSLAVSRLSRIVSSA 240 

DLVPMG+WT+RVVHLLNDQH+GVVTAA SLIT L +KNP++FKT VSLAVSRLSRIV+SA 
Sbjct: 181 DLVPMGDWTSRVVHLLNDQHLGWTAATSLITTLAQKNPEEFKTSVSLAVSRLSRIVTSA 240 

Query: 241 STDLQDYTYYFVPAPWLSVKLLRLLQCYPPPEDAAVKGRLVECLETVLNKAQEPPKSKKV 300 

STDLQDYTYYFVPAPWLSVKLLRLLQCYPPP D AV+GRL ECLET+LNKAQEPPKSKKV 
Sbjct: 241 STDLQDYTYYFVPAPWLSVKLLRLLQCYPPP-DPAVRGRLTECLETILNKAQEPPKSKKV 299 

Query: 301 QHSNAKNAILFETISLIIHYDSEPNLLVRACNQLGQFLQHRETNLRYLALESMCTLASSE 360 

QHSNAKNA+LFE ISLIIH+DSEPNLLVRACNQLGQFLQHRETNLRYLALESMCTLASSE 
Sbjct: 300 QHSNAKNAVLFEAI SLI I HHDSEPNLLVRACNQLGQFLQHRETNLRYLALESMCTLASSE 359 

Query: 361 FSHEAVKTHIDTVINALKTERDVSVRQRAADLLYAMCDRSNAKQIVSEMLRYLETADYAI 420 

FSHEAVKTHI+TVINALKTERDVSVRQRA DLLYAMCDRSNA+QIV+EML YLETADY+I 
Sbjct: 360 FS H E AVKT HIETVINALKTE RD VS VRQRA V DLL YAMC DRSN AQQ I VAEML S YL ET AD Y S I 419 

Query: 421 REEIVLKVAILAEKYAVDYSWYVDTILNLIRIAGDYVSEEVWYRVLQIVTNRDDVQGYAA 4 80 

REEIVLKVAILAEKYAVDY+WYVDTILNLIRIAGDYVSEEVWYRV+QIV NRDDVQGYAA 
Sbjct: 420 REEIVLKVAILAEKYAVDYTWYVDTILNLIRIAGDYVSEEVWYRVIQIVINRDDVQGYAA 479 

Query: 481 KTVFEALQAPACHENMVKVGGYILGEFGNLIAGDPRSSPPVQFSLLHSKFHLCSVATRAL 540 

KTVFEALQAPACHEN+VKVGGYILGEFGNLIAGDPRSSP +QF+LLHSKFHLCSV TRAL 
Sbjct: 480 KTVFEALQAPACHENLVKVGGYILGEFGNLIAGDPRSSPLIQFNLLHSKFHLCSVPTRAL 539 

Query: 541 LLSTYIKFINLFPETKATIQGVLRAGSQLRNADVELQQRAVEYLTLSSVASTDVLATVLE 600 

LLSTYIKF+NLFPE KATIQ VLR+ SQL+NADVELQQRAVEYL LS+VASTD+LATVLE 
Sbjct: 540 LLSTYIKFVNLFPEVKATIQDVLRSDSQLKNADVELQQRAVEYLRLSTVASTDILATVLE 599 

Query: 601 EMPPFPERESSILAKLKRKKGPGAGSALDDGRRDPSSNDINGGMEPTP STVSTPSPS 657 

EMPPFPERESSILAKLK+KKGP + L++ +R+ S D+NGG EP P S STPSPS 
Sbjct: 600 EMPPFPERESSILAKLKKKKGPSTVTDLEETKRERSI-DVNGGPEPVPASTSAASTPSPS 658 

Query: 658 ADLLGLRAAPP-PAAPPASAGAGNLLVDVFDGPAAQPSLGPTPEEAFLSPGPEDIGPPIP 716 

ADLLGL A PP P PP S+G G LLVDVF A+ ++ P L+PG ED 
Sbjct: 659 ADLLGLGAVPPAPTGPPPSSGGG-LLVDVFSDSAS— AVAP LAPGSEDN 704 

Query: 717 EADELLNKFVCKNNGVLFENQLLQIGVKSEFRQNLGRMYLFYGNKTSVQFQNFSPTVVHP 776 

+FVCKNNGVLFENQLLQIG+KSEFRQNLGRM++FYGNKTS QF NF+PT++ 
Sbjct: 705 FARFVCKNNGVLFENQLLQIGLKSEFRQNLGRMFIFYGNKTSTQFLNFTPTLICA 759 

Query: 777 GDLQTQLAVQTKRVAAQVDGGAQVQQVLNIECLRDFLTPPLLSVRFRYGGAPQALTLKLP 836 

DLQT L +QTK V VDGGAQVQQV +N I EC + DF P+L+++FRYGG Q +++KLP 
Sbjct: 760 DDLQTNLNLQTKPVDPTVDGGAQVQQVVNIECISDFTEAPVLNIQFRYGGTFQNVSVKLP 819 

Query: 837 VTINKFFQPTEMAAQDFFQRWKQLSLPQQEAQKIFKANHPMDAEVTKAKLLGFGSALLDN 896 

+T+NKFFQPTEMA+QDFFQRWKQLS PQQE Q IFKA HPMD E+TKAK++GFGSALL+ 
Sbjct: 820 ITLNKFFQPTEMASQDFFQRWKQLSNPQQEVQNIFKAKHPMDTEITKAKIIGFGSALLEE 879 

Query: 897 VDPNPENFVGAGIIQTKALQVGCLLRLEPNAQAQMYRLTLRTSKEPVSRHLCELLAQQF 955 
VDPNP NFVGAGII TK Q+GCLLRLEPN QAQMYRLTLRTSK+ VS+ LCELL++QF 
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Sbjct: 880 VDPNPANFVGAGIIHTKTTQIGCLLRLEPNLQAQMYRLTLRTSKDTVSQRLCELLSEQF 938 
Pedant information for DKFZphutel_20hl3, frame 3 
Report for DKFZphutel_20hl3 . 3 

[LENGTH] 955 

[MW] 105361.97 

[pi] 7.75 

[HOMOL) PIR:A30111 alpha-adaptin A - mouse 0.0 

[FUNCAT] 30.09 organization of intracellular transport vesicles [S. cerevisiae, 

YBL037w] 5e-67 

[FUNCAT] 08.19 cellular import [S. cerevisiae, YBL037w] 5e-67 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YBL037w] 5e-67 

[FUNCAT] 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDR238c] 

4e-04 

[PIRKW] heterodimer 0.0 

( PIRKW] transmembrane protein le-65 

[PIRKW] membrane trafficking 0.0 

[PIRKW] receptor 0.0 

[SUPFAM] beta-adaptin 5e-16 

[PROSITE] MYRISTYL 7 

[PROSITE] IG_MHC 1 

[PROSITE] AMI DAT I ON 1 

[PROSITE] CK2_PHOSPHO SITE 11 

[PROSITE] TYR_PHOSPHO~SITE 3 

[PROSITE] PKC_PHOSPHO~SITE 15 

[PROSITE] ASN_GLYCOSYLATION 1 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 6.81 % 

SEQ MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 

SEG 

PRD ccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccchhhhhhh 

SEQ KLLFIFLLGHDIDFGHMEAVNLLSSNKYTEKQIGYLFISVLVNSNSELIRLINNAIKNDL 

SEG 

PRD hhhhhhhcccccccchhhhhhhhhcccccchhhhhhhhhhhhhcchhhhhhhhhhhhhcc 

SEQ ASRNPTFMCLALHCIANVGSREMGEAFAADIPRILVAGDSMDSVKQSAALCLLRLYKASP 

SEG 

PRD cccccchhhhhhhhhhccchhhhhhhhhhhhhheeeccccchhhhhhhhhhhhhhhhhcc 

SEQ DLVPMGEWTARVVHLLNDQHMGWTAAVSLITCLCKKNPDDFKTCVSLAVSRLSRIVSSA 

SEG 

PRD cccccccchhhhhhhhhcccceeeehhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhcc 

SEQ STDLQDYTYYFVPAPWLSVKLLRLLQCYPPPEDAAVKGRLVECLETVLNKAQEPPKSKKV 

SEG 

PRD ccccccceeeecccchhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhccccccc 

SEQ QHSNAKNAILFETISLIIHYDSEPNLLVRACNQLGQFLQHRETNLRYLALESMCTLASSE 

SEG 

PRD cccccchhhhhhhhhhhhhcccccceeeeehhhhhhhhhhccccceeeehhhhhhhhhcc 

SEQ FSHEAVKTHIDTVINALKTERDVSVRQRAADLLYAMCDRSNAKQIVSEMLRYLETADYAI 

SEG 

PRD cchhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhcccch 

SEQ REEIVLKVAILAEKYAVDYSWYVDTILNLIRIAGDYVSEEVWYRVLQIVTNRDDVQGYAA 

SEG 

PRD hhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhccccchhhhhhhheeeccccchhhhhh 

SEQ KTVFEALQAPACHENMVKVGGYILGEFGNLIAGDPRSSPPVQFSLLHSKFHLCSVATRAL 

SEG 

PRD hhhhhhhhhhcccccceeeeeeeecccccccccccccccchhhhhhhhhhhcccchhhhh 

SEQ LLSTYIKFINLFPETKATIQGVLRAGSQLRNADVELQQRAVEYLTLSSVASTDVLATVLE 

SEG 

PRD hhhhhhhhhhccccchhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccchhhhhhhhhh 

SEQ EMPPFPERESSILAKLKRKKGPGAGSALDDGRRDPSSNDINGGMEPTPSTVSTPSPSADL 

SEG xxxxxxxxxxxxxxx 

PRD hccccccchhhhhhhhhhccccccccccccccccccccccccccccccccccccccccce 

SEQ LGLRAAPPPAAPPASAGAGNLLVDVFDGPAAQPSLGPTPEEAFLSPGPEDIGPPI PEADE 
SEG xxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx . 
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PRD eecccccccccccccccccceeeeeeccccccccccccccceeecccccccccccccccc 

SEQ LLNKFVCKNNGVLFENQLLQIGVKSEFRQNLGRMYLFYGNKTSVQFQNFSPTVVHPGDLQ 

SEG 

PRD cceeeeeccccccchhhhhhhhcchhhhhccccceeeccccccccccccceeeeccchhh 

SEQ TQLAVQTKRVAAQVDGGAQVQQVLNIECLRDFLTPPLLSVRFRYGGAPQALTLKLPVTIN 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhcccccccccchhhhhhhhhhccccccccceeeeeeccccccccccccccccc 

SEQ KFFQPTEMAAQDFFQRWKQLSLPQQEAQKIFKANHPMDAEVTKAKLLGFGSALLDNVDPN 

SEG 

PRD cccccchhhhhhhhhhhhhhhchhhhhhhhhhhcccchhhhhhhhhhccccceeeecccc 

SEQ PENFVGAGI IQTKALQVGCLLRLEPNAQAQMYRLTLRTSKEPVSRHLCELLAQQF 

SEG 

PRD ccceeeceeeeeccccceeeeecccchhhhhhhhhhhccccchhhhhhhhhhccc 



Prosite for DKFZphutel_20hl3.3 
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"site 


PDOC00005 


PS00005 


819- 


■>822 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


832- 


->835 


PKC~PHOSPHO~ 


"site 


PDOC00005 


PS00005 


935- 


■>938 


PKC PHOSPHO - 


"site 


PDOC00005 


PS00005 


938- 


■>941 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 




5->9 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


104- 


■>108 ' 


CK2~PHOSPHO _ 


"site 


PDOC00006 


PS00006 


368- 


■>372 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


379- 


->383 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


470->474 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


482- 


->486 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


597->601 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


626- 


■>630 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


636- 


■>640 


CK2 PHOSPHO - 


"site 


PDOC00006 


PS00006 


698- 


•>702 


CK2 PHOSPHO - 


"site 


PDOC00006 


PS00006 


938- 


>942 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


388- 


>395 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


411- 


■>419 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


434- 


■>443 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00008 


202- 


•>208 


MYRISTYL 




PDOC00008 


PS00008 


508- 


>514 


MYRISTYL 




PDOC00008 


PS00008 


561- 


>567 


MYRISTYL 




PDOC00008 


PS00008 


623- 


■>629 


MYRISTYL 




PDOC00008 


PS00008 


759- 


■>765 


MYRISTYL 




PDOC00008 


PS00008. 


826- 


•>832 


MYRISTYL 




PDOC00008 


PS00008 


908- 


•>914 


MYRISTYL 




PDOC00008 


PS00009 


630- 


•>634 


AMI DAT I ON 




PDOC00009 


PS00290 


127- 


■>134 


IG MHC 




PDOC00262 



(No Pfaro data available for DKFZphutel_20hl3.3) 
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DKFZphutel_20mll 



group: cell cycle 

DKFZphutel_20mll encodes a novel 225 amino acid protein with similarity to yeast sds22 and 
protein phosphatase-1 regulatory subunits. 

sds22 is a regulatory polypeptide of protein phosphatase-1 that is required for the completion 
of mitosis in both fission and budding yeast. The novel protein seems to be a new regulator 
protein for protein phosphatase-1. 

The new protein can find application in modulating/blocking the activity of protein 
phosphatase-1 and in modulating the cell cycle. 



similarity to suppressor protein sds22 

complete cDNA, complete cds, EST hits 
localisation? only a part of the STS matches 



Sequenced by AGOWA 
Locus: /map="ll n 7 
Insert length: 5822 bp 

Poly A stretch at pos. 5803, polyadenylation signal at pos . 5786 



1 GGGCGCTTGG TTCCCCAGCA 
51 CCGAGTTCCC AGCGCTTGAG 
101 GAGGCCACTC CGTTGACAGT 
151 CAACCTCTCT GGTCTTCAAC 
201 GAT G A AC C AG CCGTGCAACT 
251 TGCTCAAGCT GGCCGTCGGG 
301 CTGGCCAAGC AGGAGGGCAT. 
351 GGACTTTCGG AACATCCTCC 
401 TGAGGAAGCT GCAGCTGGAC 
451 GAGAACCTCG CACACCTGGT 
501 GACCATCGAG GGGCTGGACA 
551 TCAACAACCG GATCTCCAAG 
601 CAGGTGTTGT CGCTGGGCAA 
651 CTACCTCCGG CGGTTCAAGT 
701 CTATCTCTGA GGCAGAGGAT 
751 GACCTCATGT ACCTGGACTA 
801 CTCCCTCTCA GTCTCCCAGC 
851 TTTCTTGGAA AAGGGGCATT 
901 AGAGAAAGGG CAGCTCCCTC 
951 GTGATTCCAG CAGCACCCTT 
1001 GCCAGGCCTC TTCCACACAC 
1051 ACTGTCTAGT TTTCCAGATG 
1101 TCACGCCTGT AATCCCAGCA 
1151 GAGCCAAGGA GTTCAAGACC 
1201 TACAGAAACT ACCAAAATTA 
1251 TGGCTACTCA CAAGGCCGAG 
1301 AGGCTGCAGT GAACTAAGAA 
1351 AGTGAAAAAA TTAAAAAATT 
1401 GAGGGCAAGC AGCCAGGATC 
1451 TAAGTTGGTG TCATCCCAGG 
1501 ACCAGGCCAT CAGAGGCTCA 
1551 GGAGGTAGAG ACCTGAGTGT 
1601 AAGCCATGGT ACAGGTGGGA 
1651 ACGGGGACTA AGTTCTAGAG 
1701 ATGAGCCAGT GCGGTGGAGA 
1751 CTATGGCCTC ACTCTACCTC 
1801 GCCAGAAGGC CTGCTGAGGG 
1851 GGAGAACACC CAGTCTGGGG 
1901 GCCCTGGAGA TGGCCCCGGG 
1951 CTCCTGGTCT TTCCCTGATT 
2001 CATCAGGAGA TGGGCATTCT 
2051 TTCATGCAGG CCCCTGTGCA 
2101 CATGGCAGAG GCGGCATGGC 
2151 TAACAATGCC ACTCCCGTTC 
2201 CACCAGTACA GCATCGACGA 
2251 CCAGCTGGAG GACGAGCAGG 
2301 CTGCGTTTGT GGAACACCTG 
2351 GCTGAGGACT CAGAGGGCAA 
2401 GCTCCTTGAG ACCTACAAGG 
2451 TTGAGTATGG CCTGAAACAG 
2501 TTCAGTGAAT GTGTCCGTGA 



ACCGGGAGAC GCGTCTGCTG CGTGGAACCG 
AAGGAAAATT CTGGATCTGT TATCTGTGAG 
TGTGTAAAAC TCTGCTGCTT TCCCCAGCTC 
AACACTATCA TCAGGGAAAA CGTGGGGGAA 
CGATGGAGCC GAGGGTGATG GACGATGACA 
GACCAGGGCC CCCAGGAGGA GGCCGGGCAG 
CCTCTTCAAG GATGTCCTGT CCCTGCAGCT 
GCATAGACAA CCTCTGGCAG TTTGAGAACT 
AATAACATCA TTGAGAAGAT CGAGGGCCTG 
CTGGCTGGAT CTGTCTTTCA ACAACATTGA 
CACTGGTGAA CCTGGAGGAC CTGAGCTTGT 
ATCGACTCCC TGGACGCCCT CGTCAAGCTG 
CAACCGGATT GACAACATGA TGAACATCAT 
GCCTGCGGAC GCTCAGCCTC TCTAGGAACC 
TACAAGATGT TCATCTGTGC CTACCTTCCT 
CCGGCGCATT GATGACCACA CAGCAAGTGT 
CCTGTGAGAC AGATTCCTCA AGCCCCCAGG 
GAAGAGTAGC TTCCCCTGCC CACAACTAGG 
TTCCTAATCC CTTTACCTGA CTCTGTCAGA 
GTAAGTACTG TTTTGTGTGC GTTCCCAGGG 
TGTCCCAGGG CCACCTCACA GCCATCCTGC 
AAGAAGCTGA GGAGGGCTGG GAGCAGTGGC 
CTTTGAGAGG CTGAGGCGGG AGGATCGCTT 
AGCCTGGGCA ACATAGGGAG ACCCCATCTC 
GCCAGGTGTG GTGGCACACA CCAGTAATCC 
GTAGAAGAAT CGCTTGAGAC TAGGAGTTTG 
GATGCCATTG CACTCCAGCC TGGGCAACAG 
AGAAAAGAAA AGAAGTTGAG GAGGCCCAAG 
ACTGGCTCAA GGCCAAGCCA GGATTCACCC 
AGCAATATTA ACAGCTGAGC TCCAGAGGGA 
GGCCTGGCTC TCAGGGGCAG AGTCAGGGCT 
CATCTGAGGA TTGCCAATTG GCAGTAGTTG 
TCACCTGGGG CACATGGAGT GAGCTGGGGG 
GTGCCAGCAT TCCTGGCCAG GTACAGGGGG 
GAGCCAAGGG CCAGACCCTC GTGACCAGCC 
TGTCCTGTTG TCCTCCTTCC CTAAAAGAGG 
CTGTTGGGAG TGAGAGAGCA AGTCCTCTGT 
CGAGGGGAGC GCTCCATTGC TGTGGCTCCT 
AACCCCAGCC TGCCACGCTG CCTTCCGCTC 
TCCCTGCGCT CACAAAAACC TGGTGAGGGT 
CATCCACGAG ACCTCATGGC TTTCACAGCC 
ACACCCCTGC CCATGCGCGG GAGGCTGCAG 
AGAGGCGGTG TGGCTCGGAG GAACCTCTGG 
CCTGGTCAGA AAAAGCTTGC GGAGGCTAAG 
GCTGAAGCAC CAGGAGAACC TGATGCAGGC 
CGCAGCGGGA GGAGCTAGAG AAGCACAAGA 
AATGGCTCCT TCCTGTTTGA CAGCATGTAC 
CAATCTGTCC TACCTGCCTG GTGTCGGTGA 
ACAAGTTTGT CATCATCTGC GTGAATATTT 
CAGGAGAAGC GGAAAACAGA GCTTGACACC 
GGCCATCCAG GAAAACCAGG AGCAGGGCAA 
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2551 ACGCAAGATT GCCAAATTCG 
2601 TTCGAGAGGA GTTGGAACTG 
2651 AGTGCTGACA TCAGTGAGTT 
2701 GCTGGTGGAG CAGCTGGAGG 
2751 ATCTGGCGAT GCAGCTGCAC 
2801 ACGCCTCTGC TGGAAACGTC 
2851 TTCTTGCTCA TCAGTCCTGT 
2901 TGGAAGTGAT TCTTGTGGAA 
2951 GTCCCCCTAT CCTGGAAGTC 
3001 GAGCTATCCG GGTACTCTAA 
3051 ACAAAGACCC CTCCGCATTG 
3101 TGAGGAGGCT GCAACTCAGC 
3151 TCTGAAATTC ATTAGCAGCC 
3201 GGAGTCTGAA GTGGTGTGAT 
3251 TGGGCCAAGT GATAGCACCT 
3301 CAGACATGGG ATAGATTTCC 
3351 CTGCTGGAGA GCACAGGCAC 
3401 GTGAGGCAGC AGCTCCCAGC 
3451 TCCAAACATT TCCCAGAACC 
3501 CCAGAAAGCC ACAAGCGCAA 
3551 ACGAGAACAG C AC AT AC AT G 
3601 GCAGACAGTA CTCACATTCC 
3651 CCCTGGAGTC TATAGCAGAT 
3701 TGGAGATCAC ATCTTCCATT 
3751 GGCCTTAGGT GCCTTGCTAA 
3801 ATCTGGAGTG CCAGCCCGCT 
3851 CTTGCAAAGC ATTTTGTGTG 
3901 ACAACGCAGG GATTATGCAG 
3951 GCCCAAGGAG GTGAAATGCC 
4001 GGGAATACTG ACCCAGGCAG 
4051 CCCTGGAGGT GATGCACAGA 
4101 TGGCTGTGTG TGTGTGTGTT 
4151 ACAAGTGTTC CTGGCAAAGT 
4201 GAGGGAGTTA ATATGGTTGG 
4251 TCAGTTCACA ACCATCTGCT 
4301 TGAAGGTCTG TGATTAGAGG 
4351 TTGCTCTGTG CTGGACATCT 
4401 TTTCCAGGAG ACTATAAACA 
4451 GACTGTTTAT CGAAAATGTC 
4501 CCCCATGGGG AGGTGCTACA 
4551 AAAAGAATGT TCCACAGGGT 
4601 GATGGCCTGG TTAGAGCTGT 
4651 ACTGTTTGCA GTAGGCTCCC 
4701 ACTAGGAAAT GACGCCCCCT 
4751 GGCAACCCCC ACGTGGAAGA 
4801 AGGGTGAGGA GCCAACATCG 
4851 TCAGGGCACT CCACATGGGG 
4901 CTGTGGTCCT GCCCTCCTGG 
4951 AGAGGCAACA AGTGGTGCCA 
5001 AGGCCAGGAG CCTCCATGCA 
5051 CGAGGGTCCG TCCGAGGTGT 
5101 AGGGTTCGGA GCTTGTGAGT 
5151 TCGCCTTGGG CCCTTAAAGT 
5201 CGCGCCGGGC CTGCTCCTCC 
5251 TGGCTCAGTG CCGGGACCTG 
5301 ATCTCTATCA GCACCCTGGA 
5351 CCTGCCTAAC GACCTGCGCG 
5401 ATGCTGTCGG GGCATCGCAC 
5451 GAAGATGAGC TGGTGACCAG 
5501 CAGGATTCAC AAG GAT GAGA 
5551 TCAATCAGTA CATCGACCAC 
5601 GGCGACATCC TAGACTAGAT 
5651 ATAGCACCAG CCCCAGCCAG 
5701 CTCTAGAGAG TTGCTGGGCA 
5751 TCCCCCACCC CTGGAAAAAC 
5801 TTCACAAAAA AAAAAAAAAA 



AGGAGAAGCA CTTGTCGAGT TTAAGTGCCA 
CCCAACATTG AGAAGATGAT CCTAGAATGC 
GTTCGATGCG CTCATGACGC TGGAGATGCA 
TAAGGCTGGG CCCTGGGCAC AAGTGCCAGA 
ATCCATAGGT GAACTGTAGC CTTCATGGGC 
CAGCACGACT CAGCGTGGCA GGCTGTAGCT 
TTGCTTTTAT TACATTTTAA TCATTTACAT 
AATGAGAGGT GAGCTCATTC TTCTGAAATG 
AGTGGGGAGA GGTTTTTGAT TAGACCCCTG 
AGGCAAAGCG CACCCCCACT TGGGGACCAA 
CAGCCTGCAG TTGCCGCTTC TCAGGTGACG 
ACTAAGTAGT GAAAATGAAA AGCGCCGCTG 
AGAGTATGTG TTACAAGGCA GCGGAGGCTG 
GAATTGAACC TCATCGGATG CTGCTGTGGC 
AATCAATTCC TCACACGTCA AGTGACACCT 
CCATCACATC ACAGGGCAGG TGCTCCCTCC 
TGCAGAAGCA GCGCACAGTG CCAGGGGCGA 
CTTTTCAGGC ACGGAGATTG CCTTTCAACA 
CATGTGCCAT CCTACTTGTA TTACTGGTGG 
TCATGCTTTT CAATGACCCT ATTTTTATTC 
TGTTTGAAAA TTATGTGAGG TGCTCACTCT 
TATAGATTCC ACCCCTGCCC ACCTTGCAGC 
GGGAGTGGGG CACTCCGAGA GTGGCAGGCC 
GTTCCTTCAA TCAACACTAA CTCCCATTTG 
GCACCACAAA ACAGCAACTA ACTGAAAGAG 
CCTACTGAGG GCCTCCTCTC TGTCAGGCAC 
AAGTGACTCA TTTAACCTCA CCACAACGCC 
GTAACCTATT TCCCAGATGA GGAAGATAAG 
TTTCCCAGAG TTACACAGAG TGCTGGAGCT 
TCTAGCTCTT AACAGCTCAC TCCACTGTTT 
TGTCACTGGG AAACCCAAAG GAGAGGGGGT 
GGGCAGGCAG GTAAGGGGAG TAAGACCAGG 
TCCGGTGACA GCATTAAACA TTCAGATGGT 
AGAACAACAA CTTTAGAGAG AGCAGAGGGG 
CAGGAGGGTC AAGATGGGTG GTCTTTATGC 
AGCTGGTTGC TAAATTTTGA GGAGTACCTT 
AAATATGCAT GTTAACTGTG TTCTTTAACA 
TGTTTGAAAG GAACATTGTT GACATGGTAG 
CAAAGCCTAT ATCCTTTCTG TGATGACCTT 
GAGCCCCTGG GCTTGTCCCG GCCTCTGGAC 
CTGAGGAGGT TTCCCGACCC TCAGAACAAT 
GGTTTGGATG CCCAGAGGGA CAACATCCAA 
AGCATGATTG TTCTCATATG AGTGATGTTC 
GTGTTGCAGG CAAGCACACT CTGGGGTTGA 
CACTATAAGG AGTACATCAG GTGAAATGTT 
GAGCATGGCC AACCCTTCTT CCACCCGAAC 
CAAACTGCTG TGCTCCAGCT AGCAGCAGCC 
GGCTCACAGT CCCTCAGGGA GACAAGTTGT 
AATGCACAGG GTGAGAAGCA GTTAACCCAG 
GGAGGGAGAG AAGAGTGTGA TGGCAGGGGC 
GGGGCAGGGG CAGGGAGTCG AGGAAGGCCC 
GGACGGTGCT GCCAGCCAGA ATTTCCGAGC 
CTGTCTCCCG CCGTCTGAGA GCATCAGGGA 
CGGGCCTTTG CTTAACTCGG GGCTGCACGA 
GAGAATCACC ACCACGACAA GCTCCTGGAG 
GAAGATTGTC GAGGGCGACC TGGACGAGGA 
CGCTTTTTGT CGATAAAGAT ACGATTGTTA 
GACATCCACC TCCTGAAGAT TGACAATCGA 
AATCAACTCT TGGTGTACAC GTTTAATAGA 
TCATGAGGAA CCGCAAGCGC GTGAAGGAGA 
ATGCAGAGCG AACTGGACAA CCTGGAATGT 
GAATGTCAGC CACAGGAGCT TCTTCAAAAC 
GAGAAGGAAG TGCACACGCC TCACCCGCAC 
TCTCTCAACC GCGATCCCCA ACACCATTCT 
TTCCAAAAGT AGAGAAAATA AAGGACTCAT 
AA 



BLAST Results 



Entry KS1292248 from database EMBL: 
human STS SHGC-53917 . 
Score = 874, P - 3.3e-33, identities = 180/185 



Medline entries 



No Medline entry 
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Peptide information for frame 1 



ORF from 202 bp to 876 bp; peptide length: 225 
Category: similarity to known protein 



1 MNQPCNSMEP RVMDDDMLKL AVGDQGPQEE AGQLAKQEGI LFKDVLSLQL 

51 DFRNILRIDN LWQFENLRKL QLDNNIIEKI EGLENLAHLV WLDLSFNNIE 

101 TIEGLDTLVN LEDLSLFNNR ISKIDSLDAL VKLQVLSLGN NRI DNMMNII 

151 YLRRFKCLRT LSLSRNPISE AEDYKMFICA YLPDLMYLDY RRIDDHTASV 

201 SLSVSQPCET DSSSPQVSWK RGIEE 

BLASTP hits 

Entry S68209 from database PIR: 

sds22 protein homolog - human >TREMBL :HSSDS22MR_1 gene: "sds22 M ; 

product: "yeast sds22 homolog"; H. sapiens sds22-like mRNA 

Score » 234, P » 1.2e-19, identities « 61/143, positives = 93/143 

Entry A38439 from database PIR: 

suppressor protein sds22(+) - fission yeast (Schizosaccharomyces pombe) 
>TREMBL:SPSDS22_1 gene: "sds22+"; S. pombe sds22+ gene, complete cds. 
Score = 208, P « 5.6e-17, identities = 52/127, positives - 71/127 

Entry S43988 from database PIR: 

protein suppressor sds22 - fission yeast (Schizosaccharomyces pombe) 
>SWISSPROT:SD22_SCHPO PROTEIN PHOSPHATASES PP1 REGULATORY SUBUNIT 
SDS22. >TREMBL:SPAC4A8_12 gene: "sds22"; product: "phosphatases ppl 
regulatory subunit"; S. pombe chromosome I cosmid c4A8. 
Score = 208, P =» 8.5e-17, identities « 52/127, positives = 71/127 

Entry CEK10D2_5 from database TREMBL: 

gene: "K10D2.1"; Caenorhabditis elegans cosmid K10D2. 

Score « 214, P - 3.6e-16, identities » 50/125, positives = 75/125 



Alert BLASTP hits for DKFZphutel_20mll, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_20mll, frame 1 



Report for DKFZphutel_20mll . 1 



[LENGTH] 225 

[MW] 25955.87 

[pi] 4.63 

[HOMOL] PIR:S68209 sds22 protein homolog - human le-18 

[ FUNCAT ] 03.22 cell cycle control and mitosis (S. cerevisiae, YKLl93c] 2e-ll 

( FUNCAT ] 30.10 nuclear organization [S. cerevisiae, YKL193c] 2e-ll 

[FUNCAT] 06.07 protein modification (glycolsylation, acylation, myristylation, 

palmitylation, farnesylation and processing) [S. cerevisiae, YKLl93c) 2e-ll 

[FUNCAT] 30.05 organization of centrosome [S. cerevisiae, YOR373w] 2e-06 

[ FUNCAT ] 01.03.10 metabolism of cyclic and unusual nucleotides [S. cerevisiae, 

YJL005w] 3e-05 

[FUNCAT] 03.10 sporulation and germination [S. cerevisiae, YJL005w] 3e-05 

[FUNCAT] 30.02 organization of plasma membrane [S- cerevisiae, YJL005w] 3e-05 

[FUNCAT] 10.04.03 second messenger formation [S. cerevisiae, YJLOOSw] 3e-05 

[FUNCAT] 04.07 rna transport [S. cerevisiae, YPL169c] 9e-04 

[FUNCAT] 04.05.01.04 transcriptional control [S. cerevisiae, YCR065w] 9e-04 

[EC] 4.6.1.1 Adenylate cyclase 2e-06 

[PIRKW] nucleus 5e-16 

[PIRKW] duplication 2e-06 

[PIRKW] tandem repeat 2e-06 

[PIRKW] cAMP biosynthesis 2e-06 

[PIRKW] glycoprotein 2e-06 

[PIRKW] phosphorus-oxygen lyase 2e-06 

[SUPFAM] leucine-rich alpha-2-glycoprotein repeat homology 5e-16 

ISUPFAMJ fibromodulin 3e-07 

[SUPFAM] yeast adenylate cyclase catalytic domain homology 2e-06 

[SUPFAM) yeast adenylate cyclase 2e-06 

[PROSITE] CK2 PHOSPHO_SITE 2 

[PROSITE] PKC~PHOSPHO_SITE 1 
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[KWJ All_Alpha 

SEQ MNQPCNSMEPRVMDDDMLKLAVGDQGPQEEAGQLAKQEGILFKDVLSLQLDFRNILRIDN 

PRD ccccccccccccccchhhhhhcccccchhhhhhhhhhhchhhhhhhhhcccccccccccc 

SEQ LWQFENLRKLQLDNNIIEKIEGLENLAHLVWLDLSFNNIETIEGLDTLVNLEDLSLFNNR 

PRD hhhhhhhhhhhhcccccccccccchhhhhhhhcccccccccccccchhhhhhhhhccccc 

SEQ I SKIDSLDALVKLQVLSLGNNRI DNMMNI I YLRRFKCLRTLSLSRNPI SEAEDYKMFICA 

PRD cccchhhhhhhhhhhhhccccccccccccccchhhhhhhhhcccccccccchhhhhhhhh 

SEQ YLPDLMYLDYRRIDDHTASVSLSVSQPCETDSSSPQVSWKRGIEE 

PRD hhcccccccccccccchhhhhhhhccccccccccccccccccccc 



Prosite for DKF2phutel_20mll . 1 

PS00005 218->221 PKC PHOSPHO SITE PDOC00005 
PS00006 122->126 CK2~PHOSPHO~SITE PDOC00006 
PSO00O6 169->173 CK2 PHOSPHO SITE PDOC00006 



(No Pfam data available for DKFZphutel_20mll . 1} 
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DKF2phutel_20m24 



group: metabolism 

DKFZphutel_20m24 encodes a novel 611 amino acid protein with similarity to a hypothetical 
C.elegans protein and to yeast Alg9 protein. 

This protein is a putative mannosyl transferase that is involved in the assembly of the core 
oligosaccharide Glc3Man9GlcNAc2 . 

The new protein can find application in modulation of glycosylation of proteins and as a new 
enzyme for biotechnologic production processes. 



strong similarity to S.cerevisiae Alg9p 

complete cDNA, complete cds, potential start at Bp 23, few EST hits 
Alg9 is involved in the assembly of the core oligosaccharide 
Glc3Man9GlcNAc2 

HSAC381 corresponding genomic DNA (2 exons) 
HSB8954 corresponding genomic DNA {1 exon ) 

Sequenced by AGOWA 

Locus: /map="ll" 

Insert length: 1986 bp 

Poly A stretch at pos. 1966, polyadenylation signal at pos. 1949 



1 TTCTTTTTTC CCCAGGCTTG CCATGGCTAG TCGAGGGGCT CGGCAGCGCC 
51 TGAAGGGCAG CGGGGCCAGC AGTGGGGATA CGGCCCCGGC TGCGGACAAG 
101 CTGCGGGAGC TGCTGGGCAG CCGAGAGGCG GGCGGCGCGG AGCACCGGAC 
151 CGAGTTATCT GGGAACAAAG CAGGACAAGT CTGGGCACCT GAAGGATCTA 
201 CTGCTTTCAA GTGTCTGCTT TCAGCAAGGT TATGTGCTGC TCTCCTGAGC 
251 AACATCTCTG ACTGTGATGA AACATTCAAC TACTGGGAGC CAACACACTA 
301 CCT CATC TAT GGGGAAGGGT TTCAGACTTG GGAATATTCC CCAGCATATG 
351 CCATTCGCTC CTATGCTTAC CTGTTGCTTC ATGCCTGGCC AGCTGCATTT 
401 CATGCAAGAA TTCTACAAAC TAATAAGATT CTTGTGTTTT ACTTTTTGCG 
4 51 ATGTCTTCTG GCTTTTGTGA GCTGTATTTG TGAACTTTAC TTTTACAAGG 
501 CTGTGTGCAA GAAGTTTGGG TTGCACGTGA GTCGAATGAT GCTAGCCTTC 
551 TTGGTTCTCA GCACTGGCAT GTTTTGCTCA TCATCAGCAT TCCTTCCTAG 
601 TAGCTTCTGT AT GT AC ACT A CGTTGATAGC CATGACTGGA TGGTATATGG 
651 ACAAGACTTC CATTGCTGTG CTGGGAGTAG CAGCTGGGGC TATCTTAGGC 
701 TGGCCATTCA GTGCAGCTCT TGGTTTACCC ATTGCCTTTG ATTTGCTGGT 
751 CATGAAACAC AGGTGGAAGA GTTTCTTTCA TTGGTCGCTG ATGGCCCTCA 
801 TACTATTTCT GGTGCCTGTG GTGGTCATTG ACAGCTACTA TTATGGGAAG 
851 TTGGTGATTG CACCACTCAA CATTGTTTTG TATAATGTCT TTACTCCTCA 
901 TGGACCTGAT CTTTATGGTA CAGAACCCTG GTATTTCTAT TTAATTAATG 
951 GATTTCTGAA TTTCAATGTA GCCTTTGCTT TGGCTCTCCT AGTCCTACCA 
1001 CTGACTTCTC TTATGGAATA CCTGCTGCAG AGATTTCATG TTCAGAATTT 
1051 AGGCCACCCG TATTGGCTTA CCTTGGCTCC AATGTATATT TGGTTTATAA 
1101 TTTTCTTCAT CCAGCCTCAC AAAGAGGAGA GATTTCTTTT CCCTGTGTAT 
1151 CCACTTATAT GTCTCTGTGG CGCTGTGGCT CTCTCTGCAC TTCAGAAATG 
1201 TTACCACTTT GTGTTTCAAC GATATCGCCT GGAGCACTAT ACTGTGACAT 
1251 CGAATTGGCT GGCATTAGGA ACTGTCTTCC TGTTTGGGCT CTTGTCATTT 
1301 TCTCGCTCTG TGGCACTGTT CAGAGGATAT CACGGGCCCC TTGATTTGTA 
1351 TCCAGAATTT TACCGAATTG CTACAGACCC AACCATCCAC ACTGTCCCAG 
14 01 AAGGCAGACC TGTGAATGTC TGTGTGGGAA AAGAGTGGTA TCGATTTCCC 
14 51 AGCAGCTTCC TTCTTCCTGA CAATTGGCAG CTTCAGTTCA TTCCATCAGA 
1501 GTTCAGAGGT CAGTTACCAA AACCTTTTGC AGAAGGACCT CTGGCCACCC 
1551 GGATTGTTCC TACTGACATG AATGACCAGA ATCTAGAAGA GCCATCCAGA 
1601 TATATTGATA TCAGTAAATG CCATTATTTA GTGGATTTGG ACACCATGAG 
1651 AGAAACACCC CGGGAGCCAA AATATTCATC CAATAAAGAA GAATGGATCA 
1701 GCTTGGCCTA TAGACCATTC CTTGATGCTT CTAGATCTTC AAAGCTGCTG 
1751 CGGGCATTCT ATGTCCCCTT CCTGTCAGAT CAGTATACAG TGTACGTAAA 
1801 CTACACCATC CTCAAACCCC GGAAAGCAAA GCAAATCAGG AAGAAAAGTG 
1851 GAGGTTAGCA ACACACCTGT GGCCCCAAAG GACAACCATC TTGTTAACTA 
1901 TTGATTCCAG TGACCTGACT CCCTGCAAGT CATCGCCTGT AACATTTGTA 
1951 ATAAAGGTCT TCTGACATGA AAAAAAAAAA AAAAAA 



BLAST Results 



Entry HSAC381 from database EMBL: 

Homo sapiens chromosome 11 pac pDJl59ol, complete sequence. 
Length « 42,771 

Entry HSB8954 from database EMBL: 
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cSRL-50A3-u cSRL flow sorted Chromosome 11 specific cosraid Homo 
sapiens genomic clone CSRL-50A3. 
Length « 601 



Medline entries 



96293493: 

Stepwise assembly of the lipid-linked oligosaccharide in the 
endoplasmic reticulum of Saccharorayces cerevisiae: 
identification of the ALG9 gene encoding a putative 
mannosyl transferase. 



Peptide information for frame 2 



ORF from 23 bp to 1855 bp; peptide length: 611 
Category: strong similarity to known protein 



1 MASRGARQRL KGSGASSGDT APAADKLREL LGSREAGGAE HRTELSGNKA 
51 GQVWAPEGST AFKCLLSARL CAALLSNISD CDETFNYWEP THYLIYGEGF 
101 QTWEYSPAYA IRSYAYLLLH AWPAAFHARI LQTNKILVFY FLRCLLAFVS 
151 CICELYFYKA VCKKFGLHVS RMMLAFLVLS TGMFCSSSAF LPSSFCMYTT 
201 LIAMTGWYMD KTSIAVLGVA AGAILGWPFS AALGLPIAFD LLVMKHRWKS 
251 FFHWSLMALI LFLVPVVVID SYYYGKLVIA PLNIVLYNVF TPHGPDLYGT 
301 EPWYFYLING FLNFNVAFAL ALLVLPLTSL MEYLLQRFHV QNLGHPYWLT 
351 LAPMYIWFII FFIQPHKEER FLFPVYPLIC LCGAVALSAL QKCYHFVFQR 
401 YRLEHYTVTS NWLALGTVFL FGLLSFSRSV ALFRGYHGPL DLYPEFYRIA 
451 TDPTIHTVPE GRPVNVCVGK EWYRFPSSFL LPDNWQLQFI PSEFRGQLPK 
501 PFAEGPLATR IVPTDMNDON LEEPSRYIDI SKCHYLVDLD TMRETPREPK 
551 YSSNKEEWIS LAYRPFLDAS RSSKLLRAFY VPFLSDQYTV YVNYTILKPR 
601 KAKQIRKKSG G 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_20m24, frame 2 

SWISSPR0T:YTH3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME 
II., N « 1, Score - 957, P = 2.7e-96 

PIR;S63177 mannosyl transferase (EC 2.4.1.-) - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 533, P = 2.3e-51 

SWISSPROT:YTH3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME 
II., N - 1, Score » 957, P =■ 2.7e-96 

PIR:S63177 mannosyl transferase (EC 2.4.1.-) - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 533, P =« 2.3e-51 

>SWISSPROT:YTH3 CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME 
II. 

Length « 653 

HSPs: 

Score = 957 (143.6 bits), Expect = 2.7e-96, P » 2.7e-96 
Identities = 206/514 (40%), Positives = 296/514 (57%) 

NKAGQVWAPEGSTAFKCLLSARLCAALLSNISDCDETFNYWEPTHYLIYGEGFQTWEYSP 107 
N W + FK LLS R+ A+ I+DCDE +NYWEP H +YGEGFQTWEYSP 

NNPDNDWPFSFGSVFKMLLSIRISGAIWGIINDCDEVYNYWEPLHLFLYGEGFQTWEYSP 102 

AYAIRSYAYLLLHAWPAAFHARILQTNKILVFYFLRCLLAFVSCICELYFYKAVCKKFGL 167 

YAIRSY Y+ LH PA+ A + KI+VF +R + + E Y + A+CKK + 

VYAIRSYFYIYLHYIPASLFANLFGDTKIVVFTLIRLTIGLFCLLGEYYAFDAICKKINI 162 

HVSRMMLAFLVLSTGMFCSSSAFLPSSFCMYTTLIAMTGWYMDKTSIAVLGVAAGAILGW 227 

R + F + S+GMF +S+AF+PSSFCM T + + + + + VA ++GW 
ATGRFFILFSIFSSGMFLASTAFVPSSFCMAITFYILGAYLNENWTAGIFCVAFSTMVGW 222 

228 PFSAALGLPIAFDLLVMKHRWKSFFHWSLMALILFLVPVVVIDSYYYGKLVIAPLNIVLY 287 



Query: 


48 


Sbjct: 


43 


Query: 


103 


Sbjct: 


103 


Query: 


168 


Sbjct: 


163 


Query: 


228 
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PFSA LGLPI DtL++K F SL+ + V+ DS+Y+GK V+APLNI LY 

Sbjct: 223 PFSAVLGLPIVADMLLLKGLRIRFILTSLVIGLCIGGVQVITDSHYFGKTVLAPLNIFLY 282 

Query: 288 NVFTPHGPDLYGTEPWYFYLINGFLNFNVAFALALLVLPLTSLMEYLLQRFHVQNLGHPY 347 

NV + GP LYG EP FY+ N F N+N+ A PL+ + Y + + Q+ 
Sbjct: 283 NVVSGPGPSLYGEEPLSFYIKNLFNNWNIVIFAAPFGFPLS--LAYFTKVWMSQDRNVAL 340 

Query: 348 WLTLAPMYI WFIIFFIQPHKEERFLFPVYPLICLCGAVALSALQKCYHFVFQR 400 

+ AP+ + W +IF Q HKEERFLFP+YP I A+AL A + ++ 
Sbjct: 341 YQRFAPIILLAVTTAAWLLIFGSQAHKEERFLFPIYPFIAFFAALALDATNR LCLKK 397 

Query: 401 YRLEHYTVTSNWLALGTVFLFGLLSFSRSVALFRGYHGPLDLYPEFYRIATDPTIHTVPE 460 

++ N L++ + F +LS SR+ ++ Y +++Y T+ T + 

Sbjct: 398 LGMO NILSI LFI LCFAI LSASRTYSIHNN YGSHVEI YRSLNAELTNRT-NFKNF 450 

Query: 4 61 GRPVNVCVGKEWYRFPSSFLLPDNW QLQFI PSEFRGQLPKPFAEGPL ATRI 511 

P+ VCVGKEW+RFPSSF +P +++FI SEFRG LPKPF + TR 

Sbjct: 451 HDPIRVCVGKEWHRFPSSFFIPQTVSDGKKVEMRFIQSEFRGLLPKPFLKSDKLVEVTRH 510 

Query: 512 VPTDMNDQNLEEPSRYIDISKCHYLVDLDTMRETPREPKYSSNKEEW 558 

+PT+MN+ N EE SRY+D+ C Y+VD+D M ++ REP + ++ + 
Sbjct: 511 IPTEMNNLNQEEISRYVDLDSCDYVVDVD-MPQSDREPDFRKMRQNY 556 



Pedant information for DKFZphutel_20m24, frame 2 



Report for DKFZphutel_20m24 . 2 



[LENGTH] 611 

[MW] 69863.78 

[pi] 8.91 

[HOMOLJ SWISSPROT:YTH3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME II. 2 
93 

[FUNCAT] 09.01 biogenesis of cell wall [S. cerevisiae, YNL219c] 4e-69 

[FUNCAT] 01.06.01 lipid, fatty-acid and sterol biosynthesis IS. cerevisiae, YNL219c] 
4e-69 

[FUNCAT] 01.05.01 carbohydrate utilization [S. cerevisiae, YNL219c] 4e-59 

[PIRKW] glycosyltransferase 9e-68 

[PIRKW] transmembrane protein 9e-68 

[PIRKW] hexosyltransf erase 9e-68 

[PROSITE] MYRISTYL 9 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 7 

[PROSITE] PKC_PHOSPHO_SITE 6 

[PROSITE] AS N_GL YCOS YLAT I ON 2 

[KW] TRANSMEMBRANE 7 

[KW] LOW_COMPLEXITY 6.71 % 



SEQ MASRGARQRLKGSGASSGDTAPAADKLRELLGSREAGGAEHRTELSGNKAGQVWAPEGST 

SEG 

PRD ccchhhhhhhcccccccccccchhhhhhhhhccccccccccceeecccccccccccccch 

MEM MMMMMM 

SEQ AFKCLLSARLCAALLSNISDCDETFNYWEPTHYLIYGEGFQTWEYSPAYAIRSYAYLLLH 

SEG . . .xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhccccceeeccccceeeeeccccceeecccchhhhhhhhhhhc 

MEM MMMMMMMMMMMMMMMMM M 

SEQ AWPAAFHARILQTNKILVFYFLRCLLAFVSCICELYFYKAVCKKFGLHVSRMMLAFLVLS 

SEG 

PRD cchhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ TGMFCSSSAFLPSSFCMYTTLIAMTGWYMDKTSIAVLGVAAGAILGWPFSAALGLPIAFD 

SEG xxxxxxxxxxxxx 

PRD cceeeeccccccchhhhhhhhhhhhcccccccceeeeeehhhhhhccceeeeeecchhhh 

MEM MMMMMMMMMMMMMM 

SEQ LLVMKHRWKSFFHWSLMALILFLVPVVVIDSYYYGKLVIAPLNIVLYNVFTPHGPDLYGT 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhheeeeeeeecccccccccccceeeeeeeecccccccccc 

MEM MMMMMMM . MMMMMMMMMMMMMMMMMMMMM 

SEQ EPWYFYLINGFLNFNVAFALALLVLPLTSLMEYLLQRFHVQNLGHPYWLTLAPMYIWFII 

SEG xxxxxxxxxxxxxxx 

PRD cceeeeeecccccchhhhhhhhhhhhchhhhhhhhhhhhccccccceeeeehhhhhhhhh 

MEM ........ MI4MMMMMMMMMMMMMMMMMMMMMMMMMMM ..*•> 
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SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



FFIQPHKEERFLFPVYPLICLCGAVALSALQKCYHFVFQRYRLEHYTVTSNWLALGTVFL 

hhcccchhhhhhcccceeehhhhhhhhhhhhhhhhhhhhhhhhheeeeccchhhhhhhee 
MMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM . 

FGLLSFSRSVALFRGYHGPLDLYPEFYRIATDPTIHTVPEGRPVNVCVGKEWYRFPSSFL 

eehhhhhhhheeecccccccccccceeeeccccccceeecccceeeeeeccccccccccc 

LPDNWQLQFIPSEFRGQLPKPFAEGPLATRIVPTDMNDQNLEEPSRYIDISKCHYLVDLD 
ccccceeeecccccccccccccccccceeeeccccccccccccccceeeeeeceeeeecc 

TMRETPREPKYSSNKEEWISLAYRPFLDASRSSKLLRAFYVPFLSDQYTVYVNYTILKPR 
cccccccccccchhhhhhhhhhhhhhhhhhhhhhheeeeeeeeecceeeeeeeeeecccc 

KAKQI RKKSGG 
hhhhhhccccc 



Prosite for DKFZphutel_20m24 . 2 



PS00001 
PS00001 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 



77->81 
593->597 
606->610 
67->70 
133->136 
541->544 
545->548 
553->556 
572->575 
16->20 
79->83 
329->333 
457->461 
541->545 
545->549 
553->557 
12->18 
14->20 
32->38 
47->53 
166->172 
182->188 
218->224 
222->228 
234->240 



AS N_G L YCO S Y L AT I ON 

AS N_GL YCOS YLAT I ON 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_S ITE 

PKC_PHOSPHO_SITE 

CK2 - PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfara data available for DKFZphutel_20m24 .2) 
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DKFZphutel_21dl5 
group: uterus derived 

DKFZphutel_21dl5 encodes a novel 191 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfarn or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

Sequenced by MediGenomix 
Locus: /chromosome"" 3" 
Insert length: 5292 bp 

Poly A stretch at pos . 5273, polyadenylation signal at pos. 5252 

1 CTCCCACTAG TGTATGCCTT AATGGTGCCG CTCTTGTCCG CGTCTACGCT 

51 TGGGACCTTG GCTTCTGACT TGGAGAGTGT ACAGCTCTGC CCGACGGCAA 

101 CCCAGCTTGG GAAGAGAAGC CCCAGCGTGG GCTGGGGCTC AAGGCGCAGG 

151 AAGGCCGAGC CCGGCGCGGA CGCAGGCGGC TCCGGGCGGG CTCAGCACCC 

201 CCAGGCACCG TCTCCTAGTG ACCGCGGCGC TCGCGGGCCT GGCGGCCGTT 

251 GTCCGGGCGA CTGCGCAGCG CGGGCACCCC CGCGGCCCCT CCCCTGGGCG 

301 CGCGCGCGAC CTGGGTGCCA TGGCGGCAGC GGCGGTGACA GGCCAGCGGC 

351 CTGAGACCGC GGCGGCCGAG GAGGCCTCGA GGCCGCAGTG GGCGCCGCCA 

401 GACCACTGCC AGGCTCAGGC GGCGGCCGGG CTGGGCGACG GCGAGGACGC 

451 ACCGGTGCGT CCGCTGTGCA AGCCCCGCGG CATCTGCTCG CGCGCCTACT 

501 TCCTGGTGCT GATGGTGTTC GTGCACCTGT ACCTGGGTAA CGTGCTGGCG 

551 CTGCTGCTCT TCGTGCACTA CAGCAACGGC GACGAAAGCA GCGATCCCGG 

601 GCCCCAACAC CGTGCCCAGG GCCCCGGGCC CGAGCCCACC TTAGGTCCCC 

651 TCACCCGGCT GGAGGGCATC AAGGTGAGGA CCTCCCTGCC CCGCCGCGCT 

701 CCAGGCCCTG CACGGCTGAG CCCGAGAGGA CCGGCGCTCA GCCCGGGTCC 

751 CCACGCTGCC CCCGGCGCTG CTCTGCGTCG GTCCCGCGCG CTCCCACTCA 

801 CTCGCCTGCT GTCGCTCTCC GGGCCGGGGC GACTTGGCCC TTTTTGGGCA 

851 GCGCGGTCTG GCGCCCCAGC TGCCCGCTGT GCGCCTTTTC CTTAGGTGGG 

901 GCACGAGCGT AAGGTCCAGC TGGTCACCGA CAGGGATCAC TTCATCCGAA 

951 CCCTCAGCCT CAAGCCGCTG CTCTTCGAAA TCCCCGGCTT CCTGACTGAT 

1001 GAAGAGTGTC GGCTCATCAT CCATCTGGCG CAGATGAAGG GGTTACAGCG 

1051 CAGCCAGATC CTGCCTACTG AAGAGTATGA AGAGGCAATG AGCACTATGC 

1101 AGGTCAGCCA GCTGGACCTC TTCCGGCTGC TGGACCAGAA CCGTGATGGG 

1151 CACCTTCAGC TCCGTGAGGT TCTGGCCCAG ACTCGCCTGG GAAATGGATG 

1201 GTGGATGACT CCAGAGAGCA TTCAGGAGAT GTACGCCGCG ATCAAGGCTG 

1251 ACCCTGATGG TGACGGTGAG CTCACACCTC TGCACAGTCC TATCCCCGTG 

1301 AGCCTCCTGC CCACTCCCAG GTGCACAATT TTGAAAACTT GGGCCCTTCC 

1351 CCCACAGCCA GGCAGCCTCT CTGCACCCCT TTATAGTGGC CAGAGATGGG 

1401 GAGGTGAAGA TCCAGCCTTG CTTTTTACCC CTGGGAAGTA GGCAGGCAGC 

14 51 CAGGCCCCCC GTTCCCCTTG GTGATGGTCT CGAGGGCAGT TCTTGGAGAC 

1501 CCTTTTGATA ACATCAGGCA GAGTTGAGAG CCTGGGGACA GGAAGTAGGG 

1551 CTGCTAGTTG GCAGAGAACA GAGTGGGTGG AGCAGGAGCA AGGCGACAGT 

1601 GAGGCCAGCT AGAGCTTGGC TGTTTACCCT GCTCCATCCA TCTCTCCAGC 

1651 CAGACACGAG GTCCACCCCA GCAGACAGCT TCCCTGGTCT AAGTGAGGTC 

1701 TCCCTTGCCT TCCTCTTGTC CACCTGGAGT CATGCCGAAG CGCCTAAAAT 

1751 GGTAGTGCTG CTACCTGTGC TAACTGCTGG GGAGGGGTGG GCAGGGAAGC 

1801 TGTCATGCAA GTGGTGCCCC CTCTGGTAAT AACTCTCAGG AGGTTTCTGA 

1851 GGTGTGGTCA TCACCCTCAT GCCCAAATTC TGGACCAAGA GAGGAAGATA 

1901 CAGCAGTTAG AAAGGACTTG GAACAGTGGC TTTGCGGCTG GTGAACCAGA 

1951 GTGAAGAATC TGGCCGTGAC CTGGCTGCCA CACTGCTATA GGCCCCAGAA 

2001 CAGAGGTGGT GACAGTCTCA CAGCCCTTGA ATGTCCCCCA CCCTCAGAGG 

2051 AATCTGGGCC AAAGAGTGGA AGGTGATGTC CTTGGGTCAG CCAGAATAAC 

2101 ATGGAGCAAA GATACCAACT ACTCTTCCAG AACCCCAAGA GGGTAGAACC 

2151 CCTGCTTAAT GGTTTGAGCA GGGACAGTGG AGAATGTTCT CATGAGAGGG 

2201 GGTGGCCTGA CTTTCGTTGC TAAGTGGGCT GGTAACGCAG TAGGCAGGGC 

2251 TGGCGAAGTA GGTTCCACCC AGGATGAAAC CTGGGGTCAT GAGGAACTCC 

2301 CCGGGGGCTG GCCCTGCTTG CACCCTGGCG TATGTATGTA AGGCCCTGGA 

2351 TGAGGCCCAG CACTGCCTGC TCTCTCCTCA CCCTCCACAG GCCGGAGAGT 

2401 GGCCACCACT CTATATAGCC AGGCTGGAAG GCCAGGGTCC TGGCCATATG 

2451 GCTCAAGCTT CCTTTGGAGA ACCTTCTCTG GCCACTCTAA TAGGGGGTGG 

2501 GCCTCTTTCT TCTTAGGGCC AAATTAGGGC TTAAACTGAG AAAAGGAACT 

2551 GCTCTGGGTC TTCCTGTAAG GCCTGATGTG ACAGAAACCA GGTTCATCTG 

2601 ACCCAAAAGT CCAGGTGGGG GACAAGTGTA CAAGGCCCCT CAGTGCCTGA 

2651 GGTCAGGGGC TGCTGCTGCC TTTGGGGTAG GTAGGGAAGT GCAGCCTGCC 

2701 ACTGTTGCCT CCCAATATGG GCTTGGTGGG CATTGATGGT GGGTGCCCTG 

2751 TGCAGGAGTG CTGAGTCTGC AGGAGTTCTC CAACATGGAC CTTCGGGACT 

2801 TCCACAAGTA CATGAGGAGC CACAAGGCAG AGTCCAGTGA GCTGGTGCGG 
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2851 AACAGCCACC ATACCTGGCT 
2901 GCGTGCCATC CGCCAGAGGT 
2951 GGGGAGAAGA CTGGGCAGGG 
3001 AGGACAGAAT GGATTAACCC 
3051 GGATTGGGAC CCACTGAAAG 
3101 CCTTGCAGGC ACACAATGGG 
3151 CTTTCTGATT AGATAAATGA 
3201 GTCACAGCAG GAAAAGGGTT 
3251 GGACCTCAGG ACTCCCCGCC 
3301 CACATAGCAG GTGTCTCTGT 
3351 GAGTAACCCC CTCCTGCTCT 
3401 CTTCCAGGGG AGGTGGGTAG 
3451 CCTTGGCCAG CTCCTTCAGA 
3501 ATGCCTGCTG CCCACCAGGG 
3551 TCGTGGAGCT CAGCGAGCCG 
3601 CACTACCATG CCCACGTGGA 
3651 CTCCCATACC AAGCTGGTAG 
3701 GCCGGCAAGT ATCTCCCAAC 
3751 ACACCCATGA CACAGGCACA 
3801 GGGGCCAGGA GATCACTGGG 
3851 CCCACAAGTT GTTTACCCAA 
3901 TGACCACTGG AGTCAACACA 
3951 CCCCCTGAGT TCTGAAGCAA 
4001 CCCATTCCTC CAGGTGTTGA 
4051 TGCCTCCCTC CCCTGTCAAG 
4101 GGCCCAGCCC CTTCCCATCC 
4151 TCTGCTAGCC TACCTTTCCC 
4201 TAACTAAGTG CACCTGTGAT 
4251 AAGAGACTGG GTTTCGGGGA 
4301 CTGCCCTATT GTCTCCCATC 
4351 CCTGGGCAGC TTATCCTGCC 
4401 TGGGGACCTG CTCAGTGCCC 
4451 TTTTATTTGA ACAACGTCAC 
4501 AGATAACAGA ACCTACGATG 
4551 TTGTGGGCTG GCAGGGGCTT 
4 601 TAAGGATGTG GGCCCAAATT 
4 651 TTGGTCACCC TTGGCTGGCC 
4701 CCACCACCCT GCTGCCCACA 
4751 TGACACACGG AGGCACTGTG 
4801 AGGGCACAGC AGTCTTCTGG 
4851 GTGGGTGACG TAGACGACTA 
4901 CGGCACCAAG TGGATTGCCA 
4951 CGCGGCAAGC GCTGTTCCAA 
5001 GGCACCGACT CACAGCCCGA 
5051 GCGCGTGGAA CTCTGAGGGA 
5101 GCCAGTTGCC CAAGATCAGG 
5151 CTAAAGGTCT GGCCAATGTC 
5201 CAGTTCCTAT ATTCATGTTA 
5251 CAAATAAAAA ACCACAAGGT 



CTACCAGGGT GAGGGTGCCC ACCACATCAT 
GAGCACCTGA AGCTGTTCTC ACTGGAGCAG 
CCTCCACAGA AGTCCTTGTC TGGGGCCAAG 
ATTTGGGATT AAGTTCCATT TGTTAGACCA 
ACAGGCAATT AACAAAGGCA AATTAGCCCT 
CAACTGGGGT TAGATAGAGA TTGAGCACTT 
CCTCTTATCT TTGACCCCTT ATCTGACCCC 
TTTAAATAAA CAACTTTCTT CCAGGGAGGA 
CCCTTTATTT AGTGGAAATG TCAACATTTC 
CTTTGGCATC TGAGGGAGAA GG AT CATC AT 
TACAGGGCCA GTCTGAGATG GCTTAAGGGA 
GGGCAAAGCT TGTGGCAGGC CTAGGGTCCA 
TCACCACCTT GCCTGGGGCT GCCCAGCCAA 
TGCTGCGCCT CACTCGCCTG TCGCCTGAGA 
CTGCAGGTTG TTCGATATGG TGAGGGGGGC 
CAGTGGGCCT GTGTACCCAG AGACCATCTG 
CCAACGAGTC TGTACCCTTC GAGACCTCCT 
TGGGGGCTGC CTTCAATCCT CAGACCAGGA 
GCCCTGCACT GTGGGCGTGC CCCTTGGCAT 
TTATCCCGGT TAGTGATGCC CTCACCTCTC 
TGGCTGGAAA GGGGTGGCTA CTGGTCATCG 
GACTGATGTA CCCACAGACA CCAAAACTTG 
GGGGCAAGGC TGGGCCCCTA GCTTGTCCTG 
TCTTGATTCC ACTTAGAGAA GCTGAAGCTG 
CCAGTTCTTT CCTCTTCAGG TGGCTGTTCT 
CCAAGGAGCC CTTCAGCGCG CCCTGTTGCT 
TGCCAGGCCC TTGCTCAGGG CCATGGCATT 
CTTGGCCAAA AAACCATTGC AACTCACAGT 
AGGAGGGGCT AGGGACATTT TGGCACTGGC 
CTAGTCTGTC CTGGTCCCTG GCAACAGGAA 
CACAGGTAAG CCCCTGGGAG CATCCACAAC 
CCCCTGCCTT ACAGCTACAT GACAGTGCTG 
TGGTGGGGGC GAGACTGTTT TCCCTGTAGC 
AAATGGTAAG GGTCAACTGG GCTATTACTC 
AGACAAGTGA AGTACACACC TCTCCAGGTC 
ATTCCTTGGG CATATCTGGT TGGTTTCCCT 
TGGCCATAGA GTGGGGACAG GTTGAACACC 
GAGTCTGATT CAGGATGACG TGGACCTCCG 
ACAAGGGAAA CCTGCGTGTC AAGCCCCAAC 
TACAACTACC TGCCTGATGG GCAAGGTTGG 
CTCGCTGCAC GGGGGCTGCC TGGTCACGCG 
ACAACTGGAT TAATGTGGAC CCCAGCCGAG 
CAGGAGATGG CCCGCCTTGC CCGAGAAGGG 
GTGGGCTCTG GACCGGGCCT ACCGCGATGC 
AGAGTTAGCC CCGGTTCCCA GCCGCGGGTC 
GGTCCGGCTG TCCTTCTGTC CTGCTGCAGA 
TTGCCCCACC CCGCCAGCCG CGATACGGCG 
TTTATTGTGT ACTGACTCCA TCTGCCCCGT 
TCGAAAAAAA AAAAAAAAAA GG 



BLAST Results 



Entry HSU64252 from database EMBL: 
Human STS sequence NOTI-225. 

Score = 959, P = 1.2e-36, identities = 195/199 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from the beginning to 351 bp; peptide length: 118 
Category: questionable ORF 
Classification: no clue 

1 LPLVYALMVP LLSASTLGTL ASDLESVQLC PTATQLGKRS PSVGWGSRRR 
51 KAEPGADAGG SGRAQHPOAP SPSDRGARGP GGRCPGDCAA RAPPRPLPWA 
101 RARPGCHGGS GGDRPAA 



BLAST P hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_21cU5, frame 1 
No Alert BLASTP hits found 



Peptide information for frame 2 



ORF from 320 bp to 892 bp; peptide length: 191 
Category: putative protein 
Classification: no clue 

1 MAAAAVTCQR PETAAAEEAS RPQWAPPDHC QAQAAAGLGD GEDAPVRPLC 
51 KPRGICSRAY FLVLMVFVHL YLGNVLALLL FVHYSNGDES SDPGPQHRAQ 
101 GPGPEPTLGP LTRLEGIKVR TSLPRRAPGP ARLSPRGPAL SPGPHAAPGA 
151 ALRRSRALPL TRLLSLSGPG RLGPFWAARS GAPAARCAPF P 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_21dl5, frame 2 

PIR:EDBE75 immediate-early protein IE175 - human herpesvirus 1, N » 2, 
Score = 106, P - 0.0067 



>PIR:EDBE75 immediate-early protein IE175 - human herpesvirus 1 
Length = 1,298 

HSPs: 

Score « 106 (15.9 bits), Expect ~ 6.7e-03, Sum P(2) = 6.7e-03 
Identities = 36/103 (34%), Positives = 44/103 (42%) 



Query: 87 GDESSDPGPQHRAQGPGPEPTLGPLTRLEGIKVRTSLPRRA-PGPARLS-PRG PALS PGP 144 

G + PGP G GP P P T+ G S R P PA S P GP +P 

Sbjct: 726 GRKRKSPGPARPPGGGGPRP PKTKKSGADAPGSDARAPLPAPAPPSTPPGPEPAPAQ 782 

Query: 145 HAAPGAALRRSRALPLT-RLLSLSGPGRLGPFWAARSGAPAARCAP 189 

AAP AA + +R P+ GP LG W + P+ AP 

Sbjct: 783 PAAPRAAAAQARPRPVAVSRRPAEGPDPLGG-WRRQPPGPSHTAAP 827 

Score = 40 (6.0 bits), Expect «• 6.7e-03, Sum P(2J - 6.7e-03 
Identities - 8/21 (38%), Positives « 9/21 (42%) 

Query: 28 DHCQAQAAAGLGDGEDAPVRP 48 

DH + A G G AP P 
Sbjct: 212 DHAREARAVGRGPS S AAPAAP 232 

Pedant information for DKFZphutel_21dl5, frame 1 



Report for DKFZphutel_21dl5 . 1 

[LENGTH] 117 

[MW] 11797.32 

[pi] 10.68 

[KW] Irregular 

[KW] S I GN AL_PE PT I DE 22 

[KW] LOW_COMPLEXITY 38.46 % 

SEQ LPLVYALMVPLLSASTLGTLASDLESVQLCPTATQLGKRSPSVGWGSRRRKAEPGADAGG 

SEG xxxxxxxxxxxxxx 

PRO cccccccccccccccccccchhhhhhhhcccccccccccccccccccccccccccccccc 

SEQ SGRAQHPQAPSPSDRGARGPGGRCPGDCAARAPPRPLPWARARPGCHGGSGGDRPAA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRO ccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 



(No Prosite data available for DKFZphutel_21dl5 . 1) 
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Pedant information for DKFZphutel_21dl5, frame 2 
Report for DKFZphutel_21dl5.2 

[LENGTH] 191 

[MW] 19916.88 

tpl] 10.43 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 29.84 % 

SEQ MAAAAVTGQRPETAAAEEASRPQWAPPDHCQAQAAAGLGDGEDAPVRPLCKPRGICSRAY 

5 EG 

PRD ccceeeeccccchhhhhhhhhccccccchhhhhhhhcccccccccccccccccccchhhh 
MEM 



SEQ FLVLMVFVHLYLGNVLALLLFVHYSNGDESSDPGPQHRAQGPGPEPTLGPLTRLEGIKVR 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccccceeeeee 

MEM MMMMMMMMMMMMMMMMM 

SEQ TSLPRRAPGPARLSPRGPALSPGPHAAPGAALRRSRALPLTRLLSLSGPGRLGPFWAARS 

SEG , xxxxxxxxxxxxxxxxxxxxxxxxxxxxx . .xxxx 

PRD eeccccccccccccccccccccccccccchhhhhhhcccccceeecccccccchhhhhhc 



MEM 



SEQ GAPAARCAPFP 

SEG xxxxxxxxx. . 

PRD ccccccccccc 

MEM 



(No Prosite data available for DKFZphutel_21dl5 .2 ) 
(No Pfam data available for DKFZphutel_21dl5 . 2) 
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DKF2phutel_22d2 



group: signal transduction 

DKFZphutel_22d2 encodes a novel 580 amino acid putative GTP-binding protein related to the 
protein. Additionally, the putative protein contains an EF-hand for calcium-binding. 

G-proteins are involved in various signal transduction pathways, transferring the signal of 
cellular receptor to an intracellular signal cascade. 

The new protein can find clinical application in modulating/blocking the response to a 
cellular receptor. 



similarity to GTP-binding proteins 

complete cDNA, complete cds, potential start at Bp 64, EST hits 
complete cds according to K08F11.5 and YAL048c 



Sequenced by BMFZ 
Locus: /map="17" 



Insert length: 3247 bp 

Poly A stretch at pos. 3230, no polyadenylation signal found 



1 CTCCTGGTGA GAGGAGTCCA 

51 GAGAGCCGCC GACATGAAGA 

101 CTAGAGTTGG GAAGACATCA 

151 CCAGAAGAGG TTCCTCCCCG 

201 CACCCCAGAG AGAGTTCCAA 

251 AGAGTGATGA ACAACTTCAT 

301 ATAGTGTATG CCGTTAACAA 

351 ATGGATTCCT CTCATAAATG 

401 TAATATTGGT TGGGAACAAA 

451 ACCATCCTTC CTATTATGAA 

501 GTGTTCAGCG AAAAACCTGA 

551 AGAAAGCTGT TCTTCATCCT 

601 GAGATGAAAC CAGCTTGTAT 

651 TGATCAAGAT AATGATGGTA 

701 AGAGGATTTG TTTCAACACT 

751 AAGAATGTAG TCAGAAAACA 

801 GACCCTGAAA GGTTTTCTCT 

851 GACACGAAAC TACTTGGACT 

901 CTGGATTTGA CACCTGAATA 

951 TTGCACTACT GAATTAAATC 

1001 TTGACAAGCA TGATTTGGAT 

1051 AAAGATTTAT TTAAAGTTTT 

1101 TAACACAGTT TGTACCAATG 

1151 TTTCCCAGTG GACGCTCACG 

1201 TATTTGGGCT ATCTAGGCTA 

1251 TTCAGCTGTT ACAGTGACAA 

1301 AAACTCAAAG AAATGTGTTC 

1351 GGGAAAAGTG GAGTTCTTCA 

1401 GAAGAAAATT CGTGAAGATC 

1451 ATGTATATGG ACAAGAGAAA 

1501 GAATTTCTAA CTGAAGCTGA 

1551 TGATGTCAGC AATCCCAAAT 

1601 AACACTTTAT GGACAGCAGA 

1651 GACCTGCATG AAGTTAAACA 

1701 CAGGAAACAC AAAATGCCTC 

1751 ATGCCCCCAG TAAGGATATC 

1801 CCGTAAGTAC TTGCTGTCTT 

1851 CATGCCATTA TTAGCCATGA 

1901 CAGCAACAGA AAGATACTTT 

1951 CAAGTTTGGT TTGAATGCCA 

2001 AATATCTGTA TATTTTTGAG 

2051 AATAAAACAC AACCCCCCAC 

2101 AAATGGGTTT GGCATCATGT 

2151 AACAGAAAGT TTATATTTTT 

2201 CTAAAATATT TTATTAATTT 

2251 GATATGTCTT TTTTAAGTGC 

2301 CTGTATAAAT GTTTTACATT 

2351 TTTATTATAT ATCTATACAT 

2401 CTCTGTAGTT TACTAACTGC 

24 51 GCCTCAAGTA GTGTGTTTGT 

2501 TCAGGCAGTG CGTTTCTCAG 

2551 GTTAGTCTCT AAATTATTTT 



CTCCGTGCGT GCGGGCGGAG GCCGGCCCCC 
AAGACGTGCG GATCCTGCTG GTGGGAGAAC 
CTGATTATGT CTCTGGTCAG TGAAGAATTT 
GGCAGAAGAA ATCACCATTC CAGCTGATGT 
CACACATTGT AGATTACTCA GAAGCAGAAC 
CAAGAAATAT CTCAGGCTAA TGTCATCTGT 
CAAGCATTCT ATTGATAAGG TAACAAGTCG 
AAAGAACAGA CAAAGACAGC AGGCTGCCTT 
TCTGATCTGG TGGAATATAG TAGTATGGAG 
CC AG TAT AC A GAAATAGAAA CCTGTGTGGA 
AGAACATATC AGAGCTCTTT TATTACGCAC 
ACAGGGCCCC TGTACTGCCC AGAGGAGAAG 
AAAAGCCCTT ACTCGTATAT TTAAAATATC 
CTCTCAATGA TGCTGAACTC AACTTCTTTC 
CCATTAGCTC CTCAAGCTCT GGAGGATGTC 
TATAAGTGAT GGTGTGGCTG ACAGTGGGTT 
TTTTACACAC ACTTTTTATC CAGAGAGGGA 
GTGCTTCGAC GATTTGGTTA TGATGATGAC 
TTTGTTCCCC CTGCTGAAAA TACCTCCTGA 
ATCATGCATA TTTATTTCTC CAAAGCACCT 
AGAGACTGTG CTTTGTCACC TGATGAGCTT 
CCCTTACATA CCTTGGGGGC CAGATGTGAA 
AAAGAGGCTG GATAACCTAC CAGGGATTCC 
ACTTATTTAG ATGTACAGCG GTGCCTGGAA 
TTCAATATTG ACTGAGCAAG AGTCTCAAGC 
GAGATAAAAA GATAGACCTG CAGAAAAAAC 
AGATGTAATG TAATTGGAGT GAAAAACTGT 
GGCTCTTCTT GGAAGAAACT TAATGAGGCA 
ATAAATCCTA CTATGCGATT AACACTGTTT 
TACTTGTTGT TGCATGATAT CTCAGAATCG 
AATCATTTGT GATGTTGTAT GCCTGGTATA 
CCTTTGAATA CTGTGCCAGG ATTTTTAAGC 
ATACCTTGCT TAATCGTAGC TGCAAAGTCA 
AGAATACAGT ATTTCACCTA CTGATTTCTG 
CACCACAAGC CTTCACTTGC AATACTGCTG 
TTTGTTAAAT TGACAACAAT GGCCATGTAT 
CATTTTCATG TTGCATGGTT CATAACATTG 
AGGGAATATC TTTGTCACAT AGGAATTGTT 
GTAATGAGAA GGTACAAATT TGAGTAAATG 
TAATAAAATG ATATAAACAG TGCTTCTGAC 
CAGGCTGTAA CTATCTTAAT AGAATAGTAC 
CCAGCATTAA AAAATAGTTT TACTGGAATA 
TGTTTTATGC TTATAAAGCA TTTTCATATG 
CTGTTTTTGA CCTTAGGTAT ATGAAGTTTT 
ATGTTGAAAT TGTGGGTATG CTTCAGTTAG 
TGTAAAGAGT AGTTGTAATT GGAATTTCTA 
AAGTGTTACG AGCCACAAAT TTCATGTACA 
GC AT ATGC AC AAGCACATAA CTGTGGTCAT 
CTTAAAATTG CATGGTTCTT AATGGCATTC 
ATAAATTCTG TTTTGTAACA AAATAGTTTT 
GACTTTATAG CTTATTCTAC TTATTCTTAT 
TCTTCTTATG AAAACTACAG TGTAACACAG 
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2601 AGTAATAATC AAACATTGCT ATAAACCAAG AATGACATTT TTCAAAAAGG 

2651 TGTTGATTTG TACAGATTTT TAAAGTCAGT TAACTTTACT GCTATTTTAT 

2701 TACCTAATAC TTTTTTTAGA TGCAACAAAC CCTTGAATTT CTATTTGTAT 

2751 TCGAAGACAA GTCATTCCTA TTATTATAGA ATAACCAAAA CCTTATTTAT 

2801 GTTTTACCTT TGCTTTAAAA CTCTCATGTA TGTTATCTAC AGAGAGGATC 

2851 ATTACAGAGA CAGACTCTCC CGAGACATGG GCCACACTGA TAGAATAGAG 

2901 AATTTGAGAA AAATCTGGGT CTTTCTAAAA ACTGCTTTGT AAGTTACTTT 

2951 TTCTTTATGA CTTCTGTGGG ATTTTGTTGA TATTTTCTTA GAGAATGACC 

3001 AAATCTCCTT TCTTGCCATA ATTAACATTT AGTAATTATG TAGAAACGCA 

3051 CTGCTTGGTC AGGCTTCCTG CCTAGCTATA TATTACGTTG TCTTCCTTAC 

3101 TACATAAATG TACTTCTTTA ATCTTGTGAT TACAGTAACT GCAAGTGTGT 

3151 TTTTACATCT GCATTTTTAA AACATTTTAC TGTAATTCTG TTGTGTGTGT 

3201 GTGTGTTATA TGATAAATGT ACATACATGG AAAAAAAAAA AAAAAAA 



BLAST Results 



Entry AC004527 from database EMBL: 

*** SEQUENCING IN PROGRESS *** NFl-related locus, Direct Submission; 

HTGS phase 1, 10 unordered pieces. 

Score = 1899, P = l.le-78, identities » 387/396 



Entry HS148355 from database EMBL: 
human STS SHGC-31220. 
Score ° 1826, P = 7.5e-78, identities = 388/406 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 64 bp to 1803 bp; peptide length: 580 
Category: similarity to known protein 



1 MKKDVRI LLV GEPRVGKTSL 

51 VPTHIVDYSE AEQSDEQLHQ 

101 INERTDKDSR LPLILVGNKS 

151 NLKNISELFY YAQKAVLHPT 

201 DGTLNDAELN FFQRICFNTP 

251 FLFLHTLFIQ RGRHETTWTV 

301 LNHHAYLFLQ STFDKHDLDR 

351 TNERGWITYQ GFLSQWTLTT 

401 VTRDKKIDLQ KKQTQRNVFR 

451 EDHKSYYAIN TVYVYGQEKY 

501 PKSFEYCARI FKQHFMDSRI 

551 MPPPQAFTCN TADAPSKDIF 



IMSLVSEEFP EEVPPRAEEI TIPADVTPER 
EISQANVICI VYAVNNKHSI DKVTSRWIPL 
DLVEYSSMET ILPIMNQYTE IETCVECSAK 
GPLYCPEEKE MKPACIKALT RIFKISDQDN 
LAPQALEDVK NWRKHISDG VADSGLTLKG 
LRRFGYDDDL DLTPEYLFPL LKIPPDCTTE 
DCALSPDELK DLFKVFPYIP WGPDVNNTVC 
YLDVQRCLEY LGYLGYSILT EQESQASAVT 
CNVIGVKNCG KSGVLQALLG RNLMRQKKIR 
LLLHDISESE FLTEAEIICD VVCLVYDVSN 
PCLIVAAKSD LHEVKQEYSI SPTDFCRKHK 
VKLTTMAMYP 



BLAST P hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel_22d2, frame 1 

TREMBL:CEUK08F11_3 gene: "K08F11.5"; Caenorhabditis elegans cosmid 
K08F11., N - 1, Score = 1357, P = l.le-138 

TREMBL:SPCC320_4 gene: "SPCC320 .04c" ; product: "hypothetical protein"; 
S.pombe chromosome III cosmid c320. ( N = 1, Score = 889, P - 4.4e-89 

TREMBL:CEUC47C12_3 gene: "C47C12 . 4"; Caenorhabditis elegans cosmid 
C47C12., N = 2, Score - 408, P = 5.6e-74 

PIR:S51971 probable membrane protein YAL048C - yeast (Saccharomyces 
cerevisiae), N = 1, Score » 677, P - 1.3e-66 



>TREMBL: CEUK08F11 3 gene: "K08F11.5"; Caenorhabditis elegans cosmid 
K08F11. 

Length =625 



HSPs: 
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Score = X357 (203.6 bits), Expect - l.le-138, P - l.le-138 
Identities = 263/582 (45%), Positives = 380/582 (65%) 

Query: 4 DVRILLVGEPRVGKTSLIMSLVSEEFPEEVPPRAEEITIPADVTPERVPTHIVDYSEAEQ 63 

DVRI+L+G+ GKTSL+MSL+ +E+ + VP R + + IPADVTPE V T IVD S E+ 
Sbjct; 9 DVRIVLIGDEGCGKTSLVMSLLEDEWVDAVPRRLDRVLIPADVTPENVTTSIVDLSIKEE 68 

Query: 64 SDEQLHQEISQANVICIVYAVNNKHSIDKVTSRWIPLINERTDKDSRLPLILVGNKSDLV 123 

+ + EI QANVIC+VY+V ++ ++D + ++W+PLI + + P+ILVGNKSD 
Sbjct: 69 DENWIVSEIRQANVICWYSVTDESTVDGIQTKWLPLIRQSFGEYHETPVILVGNKSDGT 128 

Query: 124 EYSSMETILPIMNQYTEIETCVECSAKNLKNISELFYYAQKAVLHPTGPLYCPEEKEMKP 183 

++ + ILPIM TE+ETCVECSA+ +KN+SE+FYYAQKAV++PT PLY + K++ 
Sbjct: 129 A-NNTDKILPIMEANTEVETCVECSARTMKNVSEIFYYAQKAVIYPTRPLYDADTKQLTD 187 

Query: 184 ACIKALTRIFKISDQDNDGTLNDAELNFFQRICFNTPLAPQALEDVKNVVRKHISDGVAD 24 3 

KAL R+FKI D+DNDG L+D ELN FQ++CF PL ALEDVK V DGVA+ 
Sbjct: 188 RARKALI RVFKICDRDNDGYLSDTELNDFQKLCFGI PLTSTALEDVKRAVSDGCPDGVAN 247 

Query: 244 SGLTLKGFLFLHTLFIQRGRHETTWTVLRRFGYDDDLDLTPEYLFPLLKIPPDCTTELNH 303 

L L GFL+LH LFI+RGRHETTW VLR+FGY+ L L+ +YL+P + IP C+TEL+ 
Sbjct: 248 DSLMLAGFLYLHLLFIERGRHETTWAVLRKFGYETSLKLSEDYLYPRITIPVGCSTELSP 307 

Query: 304 HAYLFLQSTFDKHDLDRDCALSPDELKDLFKVFPYIPWGPDVNNTVCTNERGWITYQGFL 363 

F+ + F+K+D D+D LSP EL++LF VP D + TN+RGW+TY G++ 

Sbjct: 308 EGVQFVSALFEKYDEDKDGCLSPSELQNLFSVCPVPVITKDNILALETNQRGWLTYNGYM 367 

Query: 364 SQWTLTTYLDVQRCLEYLGYLGYSILTEQESQAS AVTVTRDKKIDLQKKQTQRNVF 419 

+ W +TT +++ + EL YLG+ + +A ++ VTR++K DL+ T R VF 

Sbjct: 368 AYWNMTTLINLTQTFEQLAYLGFPVGRSGPGRAGNTLDSIRVTRERKKDLENHGTDRKVF 427 

Query: 420 RCNVIGVKNCGKSGVLQALLGRNLMRQKKIREDHKSYYAINTVYVYGQEKYLLLHDI 476 

+C V+G K+ GK+ +Q+L GR + +1 H S + IN V V + KYLLL ++ 
Sbjct: 428 QCL VVGA K D AG KT V FMQS LAG RGMADV AQ I G RRH - S P FV I N RV RVKEES K YL L LRE V D VL 486 

Query: 477 SESEFLTEAEIICDVVCLVYDVSNPKSFEYCARIFKQHFMDSRIPCLIVAAKSDLHEVKQ 536 

S + L E DVV +YD+SNP SF +CA +++++F ++ PC+++A K + EV Q 
Sbjct: 487 S PQD ALG SG ET S ADVV AFL Y DISNPDSFA FC AT V YQK Y F Y RT KT PCVM I AT KVERE EV DQ 546 

Query: 537 EYSISPTDFCRKHKMPPPQAFTCNTADAPSKDIFVKLTTMAMYP 580 

+ + P +FCR+ ++P P F+ S IF +L MA+YP 

Sbjct: 547 RWEVPPEEFCRQFELPKPIKFSTGNIGQSSSPIFEQLAMMAVYP 590 



Pedant information for DKFZphutel_22d2 / frame 1 



Report for DKFZphutel_22d2 . 1 



[LENGTH) 
IMW] 
tpH 
IHOMOL] 
149 

[FUNCAT] 
[ FUNCAT ) 
3e-ll 
[FUNCAT] 
cerevisiae, 
(FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
8e-09 
[ FUNCAT ] 
8e-09 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ) 
[FUNCAT] 
[FUNCAT] 
9e-08 
[FUNCAT] 
YFLOOSw] 9e-08 
[FUNCAT] 
[FUNCAT] 



580 

66541.61 
5.56 

TREMBL:CEUK08F11_3 gene: 



"KOSFILS"; Caenorhabditis elegans cosmid K08F11. le 



99 unclassified proteins [S. cerevisiae, YAL048c] 5e-81 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YKR055w] 

03.99 other cell growth, cell division and dna synthesis activities [S 
YNL09Sc] 8e-09 

10.04.07 g-proteins [S. cerevisiae, YNL098c] 8e-09 
03.10 sporulation and germination [S. cerevisiae, YNL098c] 8e-09 
11.01 stress response [S. cerevisiae, YNL098c] 8e-09 
03.22 cell cycle control and mitosis [S. cerevisiae, YNL098c] 8e-09 
01.03.13 regulation of nucleotide metabolism [S. cerevisiae, YNL098c] 



01.05.04 regulation of carbohydrate utilization 



[S. cerevisiae, YNL098c] 



30.03 organization of cytoplasm [S. cerevisiae, YORlOlw) 4e-0B 
11.10 cell death [S. cerevisiae, YORlOlw] 4e-08 

10.02.07 g-proteins [S. cerevisiae, YPRl65w] 7e-08 

30.04 organization of cytoskeleton [S. cerevisiae, YPR165w] 7e-08 
30.08 organization of golgi [S. cerevisiae, YPRl65w] 7e-08 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YFLOOSw) 



30.09 organization of intracellular transport vesicles 



[S. cerevisiae, 



30.02 organization of plasma membrane [S. cerevisiae, YFLOOSw] 9e-08 

08.13 vacuolar transport [S. cerevisiae, YNL093w] le-07 
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CFUNCAT] 


06.04 protein targeting, sorting and translocation [S. cerevisiae, YNL093w] 


le-07 


[FUNCAT] 


08.19 cellular import [S. cerevisiae, YNL093w] le-07 


[FUNCAT] 


10.05.07 g-proteins [S. cerevisiae, YLR229c] 8e-07 


[ FUNCAT ) 


03.07 pheromone response, mating-type determination, sex-specific proteins 


[S. 


cerevisiae, YLR229c] 8e-07 


[FUNCAT] 


10.99 other signal-transduction activities [S. cerevisiae, YCR027c] 3e-06 


[FUNCAT] 


09.09 biogenesis of intracellular transport vesicles [S. cerevisiae, 


YGL210W] 9e 


-04 


[BLOCKS] 


BL00410A Dynamin family proteins 


[SCOP] 


dlplk 3.25.1.3.1 cH-p21 Ras protein [human (Homo sapiens) 2e-42 


[SCOP] 


dlguaa_ 3.25.1.3.10 RaplA [Human (Homo sapiens) 5e-59 


[PIRKW] 


transmembrane protein le-7 9 


[PIRKW] 


membrane trafficking 2e-06 


[PIRKW] 


acetylated amino end 3e-09 


[PIRKW] 


prenylated cysteine 3e-09 


[PIRKW] 


signal transduction le-07 


[PIRKW] 


transforming protein 3e-09 


[PIRKW] 


immediate— early protein 8e-06 


[PIRKW] 


alternative splicing 4e-08 


[PIRKW] 


P-loop le-10 


[PIRKW] 


lipoprotein 7e-10 


[PIRKW] 


proto-oncogene 3e-09 


[PIRKW] 


methylated carboxyl end 3e-09 


[PIRKW] 


membrane protein 3e-09 


[PIRKW] 


GTP binding le-10 


[PIRKW] 


thiolester bond 7e-10 


[SUPFAM1 


ras transforming protein le-10 


[PROSITE] 


ATP GTP A 2 


[PROSITE] 


MYRISTYL 3 


[PROSITE] 


EF HAND 1 


[PROSITE] 


CAMP PHOSPHO SITE 1 


[PROSITE] 


CK2 PHOSPHO SITE 14 


[PROSITE] 


TYR PHOSPHO SITE 4 


(PROSITE] 


PKC PHOSPHO SITE 5 


[PROSITE] 


ASN_GLYCOSYLATION 3 


[PFAM] 


Ras family (contains ATP/GTP binding P— loop) 


(KWJ 


Irregular 


[KW] 


3D 



SEQ MKKDVRILLVGEPRVGKTSLIMSLVSEEFPEEVPPRAEEITIPADVTPERVPTHIVDYSE 

ljai- . . . EEEEEEEETTTTCHHHHHHHHHHCCCCCCCCCCCCEEEEEEEETTEEEEEEEEECCC 

SEQ AEQSDEQLHQEISQANVICIVYAVNNKHSIDKVTSRWIPLINERTDKDSRLPLILVGNKS 

ljai- CGGGHHHHHHHHHHTTEEEEEEETTTHHHHHHH-HHHHHHHHHHHCTTT-TCEEEEEETT 

SEQ DLVEYSSMETILPIMNQYTEIETCVECSAKNLKNISELFYYAQKAVLHPTGPLYCPEEKE 

ljai- TTTTTTTTHHHHHHHHHHHCCCE-EECTTTTTTTHHHHHH 

SEQ MKPACIKALTRI FKISDQDNDGTLNDAELNFFQRICFNTPLAPQALEDVKNVVRKHISDG 

ljai- 

SEQ VADSGLTLKGFLFLHTLFIQRGRHETTWTVLRRFGYDDDLDLTPEYLFPLLKIPPDCTTE 

ljai- 

SEQ LNHHAYLFLQSTFDKHDLDRDCALSPDELKDLFKVFPYIPWGPDVNNTVCTNERGWITYQ 

ljai- 

SEQ GFLSQWTLTTYLDVQRCLEYLGYLGYSILTEQESQASAVTVTRDKKI DLQKKQTQRNVFR 

ljai- 

SEQ CNVIGVKNCGKSGVLQALLGRNLMRQKKIREDHKSYYAINTVYVYGQEKYLLLHDISESE 

ljai- 

SEQ FLTEAEIICDVVCLVYDVSNPKSFEYCARIFKQHFMDSRIPCLIVAAKSDLHEVKQEYSI 

ljai- 

SEQ SPTDFCRKHKMPPPQAFTCNTADAPSKDIFVKLTTMAMYP 

ljai- 



Prosite for DKFZphutel_22d2 . 1 



PS00001 
PS00001 
PS00001 
PS00004 
PS00005 
PS00005 



118->122 
154->158 
346->350 
411->415 
94->97 
105->108 



ASN GLYCOSYLATION 
ASN~GLYCOSYLATION 
ASN_GLYCOSYI*ATION 
CAMP_PHOSPHO_SITE 
PKC_PHOS PHO_S ITE 
PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
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PS00005 


148->151 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


247->250 


PKC PHOSPHO" 


SITE 


PDOC00005 


PS00005 


414->417 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00006 


59->63 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


105- 


>109 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


126->130 


CK2"PHOSPHO" 


'site 


PDOC00006 


PS00006 


139- 


>143 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


143->147 


CK2"PHOSPHO~ 


"site 


PDOC00006 


psooooe 


196- 


■>200 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


203- 


•>207 


CK2 PHOSPHO" 


"site 


PDOC00006 


psooooe 


311- 


•>315 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


325- 


>329 


CK2"PHOSPHO" 


"site 


PDOC00006 


psooooe 


370- 


•>374 


CK2 PHOSPHO" 


"site 


PDOC00006 


psooooe 


390- 


•>394 


CK2 PHOSPHO" 


"site 


PDOC00006 


psooooe 


477- 


•>481 


CK2 PHOSPHO" 


"site 


PDOC00006 


psooooe 


483- 


•>487 


CK2"PHOSPHO~ 


"site 


PDOC00006 


PS00006 


541- 


>545 


CK2~PHOSPHO" 


"site 


PDOC00006 


PS00007 


153- 


>161 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


376- 


■>384 


TYR~PHOSPHO~ 


"site 


PDOC00007 


PS00007 


153->162 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


448->457 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00008 


240->246 


MYRISTYL 




PDOC00008 


PS00008 


425- 


•>431 


MYRISTYL 




PDOC00008 


PS00008 


433- 


->439 


MYRISTYL 




PDOC00008 


PS00017 


11->19 


ATP GTP A 




PDOC00017 


PS00017 


425- 


>433 


ATP GTP A 




PDOC00017 


PS00018 


197- 


->210 


EF HAND 




PDOC00018 



Pfam for DKFZphutel_22d2 . 1 



HMM_NAME Ras family (contains ATP /GTP binding P-loop) 

HMM *KLVLIGDSGVGKSCLLIRFTQNeFnEeYIPTIGvDFYtKTIEIDGKtIK 

++L+G+ VGK++L ++ EF+EE +P ++ T ++ +++ 
Query 6 RILLVGEPRVGKTSLIMSLVSEEFPEE-VPPR-AEEITIPADVTPERVP 52 

HMM LQIWDTAGQERYRsMRPMYYRGAMGFMLVYDITNRqSFENIr . NWweEIr 

ID E+ + + + +A+++ +VY+++N+ S ++++ +W++ 1+ 
Query 53 THIVDYSEAEQSDEQLHQEISQANVICIVYAVNNKHSIDKVTSRWIPLIN 102 

HMM RHCDrDENVPIMLVGNKCDLEDQRQVStEEGQeFAREWGAI PFMETSAKT 

+ D+D+ P +LVGNK+DL + ++T + +E+SAK+ 

Query 103 ERTDKDSRLPLILVGNKSDLVEYSSMETILPIMNQYTEI-ETCVECSAKN 151 

HMM NiNVEEAFMEIvRellqrMqeqNqteNinidQpsrnrkrCCCIM* 

N+ E F+ + +++L + +++ +++++ + C+ 

Query 152 LKNISELFYYAQKAVLHPT GPLYCPEEKEMK-PACI — 186 
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DKFZphutel - 22el2 



group: signal transduction 

DKFZphutel_22el2 encodes a novel 92 amino acid protein, with similarity to yeast, C.elegans, 
Drosophila and mammalian proteins. 

The Drosophila cni and mammalian cornicon proteins are part of a signal transduction pathway 
involving hte EGF-receptor . 

The new protein can find application in modulating the cornichon modulated signal transduction 
way and also the EGF receptor signaling processes. 

strong similarity to S.cerevisiae YGL054c and cornichon 
complete cDNA, complete cds, EST hits 

cornicon is requiered for signal transduction in the EGF-receptor 
signal processing 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 519 bp 

Poly A stretch at pos. 499, no polyadenylation signal found 



1 GTCGGGGCAT CCGAGCGGGT TTGACGGAAG GAGCGGCGGC GACGGAGGAG 

51 GAGGATGGAG GCGGTGGTGT TCGTCTTCTC TCTCCTCGAT TGTTGCGCGC 

101 TCATCTTCCT CTCGGTCTAC TTCATAATTA CATTGTCTGA TTTAGAATGT 

151 G ATT AC ATT A ATGCTAGATC ATGTTGCTCA AAATTAAACA AGTGGGTAAT 

201 TCCAGAATTG ATTGGCCATA CCATTGTCAC TGTATTACTG CTCATGTCAT 

251 TGCACTGGTT CATCTTCCTT CTCAACTTAC CTGTTGCCAC TTGGAATATA 

301 TATCGTATGA TCTTAGCTTT GATAAATGAC TGAAGCTGGA GAAGCCGTGG 

351 TTGAAGTCAG CCTACACTAC AGTGCACAGT TGAGGAGCCA GAGACTTCTT 

401 AAATCATCCT TAGAACCGTG ACCATAGCAG TATATATTTT CCTCTTGGAA 

451 CAAAAAACTA TTTTTGCTGT ATTTTTACCA TATAAAGTAT TTAAAAAACA 
501 TGAAAAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



95300228: 

cornichon and the EGF receptor signaling process are necessary for both 
anterior-posterior 

and dorsal-ventral pattern formation in Drosophila. 



Peptide information for frame 1 



ORF from 55 bp to 330 bp; peptide length: 92 
Category: strong similarity to known protein 



1 MEAVVFVFSL LDCCALIFLS VYFIITLSDL ECDYINARSC CSKLNKWVIP 
51 ELIGHTIVTV LLLMSLHWFI FLLNLPVATW NIYRMILALI ND 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_22el2, frame 1 

PIR:S64058 probable membrane protein YGL054c - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 185, P « 5.7e-17 

TREMBL:SPAC2C4_5 gene: "SPAC2C4 . 05"; product: "cornichon homolog"; 
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S.pombe chromosome I cosmid c2C4 . , N = 1, Score = 163, P - 3.7e-12 

PIR:S46084 probable membrane protein YBR210w - yeast (Saccharomyces 
cerevisiae), N ■» 1, Score = 162, P - 4.8e-12 

TREMBL:AF104398_1 product: "cornichon"; Homo sapiens cornichon mRNA, 
complete cds., N » 1, Score - 141, P = 8e-10 

SWISSPROT:CNI_DROVI CORNICHON PROTEIN . , N - 1, Score = 139, P = 1.3e-09 

>PIR:S64058 probable membrane protein YGL054c - yeast (Saccharomyces 
cerevisiae) 

Length - 138 

HSPs: 

Score = 185 (27.8 bits), Expect = 5.7e-17, Sum P(2) - 5.7e-17 
Identities « 35/85 (41%), Positives = 56/85 (65%) 

Query: 1 MEAVA/FVFSLLDCCALIFLSVYFIITLSDLECDYINARSCCSKLNKWVIPELIGHTIVTV 60 

M A +F+ +++ C +F V+F I +DLE DYIN CSK+NK + PE H +++ 
Sbjct: 1 MGAWLFILAWVNCINLFGQVHFTILYADLEADYINPIELCSKVNKLITPEAALHGALSL 60 

Query: 61 LLLMSLHWFI FLLNLPVATWN I YRM 85 

L L++ +WF+FLLNLPV +N+ ++ 
Sbjct: 61 LFLLNGYWFVFLLNLPVLAYNLNKI 85 

Score = 37 (5.6 bits), Expect ~ 5.7e-17, Sum P(2) = 5.7e-17 
Identities = 7/9 (77%), Positives = 9/9 (100%) 

Query: 82 IYRMILALI 90 

+YRMI+ALI 
Sbjct: 123 LYRMIMALI 131 



Pedant information for DKFZphutel_22el2, frame 1 



Report for DKFZphutel_22el2 . 1 



t LENGTH J 92 

[MW] 10614.98 

[pi] 5.04 

[HOMOL] PIR:S64058 probable membrane protein YGL054c - yeast (Saccharomyces cerevisiae) 
Se-14 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YGL054c] 
2e-15 

tPIRKW] transmembrane protein 2e-ll 

[PROSITE] CK2_PHOSPHO_SITE 3 

[KW] SIGNAL_PEPTIDE 33 

[KW] TRANSMEMBRANE 2 



SEQ MEAVVFVFSLLDCCALIFLSVYFIITLSDLECDYINARSCCSKLNKWVIPELIGHTIVTV 

PRD ccchhhhhhhhhhhhhhhhhhhheeeccccccccccccccccccceeehhhhhhhhhhhh 

MEM MMMMMMMMMM 

SEQ LLLMSLHWFI FLLNLPVATWN I YRMILAL I ND 

PRD hhhhhhhheeecccccchhhhhhhhhhhhccc 

MEM MMMMMMMMMMMMMMMMMMM . .MMMMMMM. . . . 



Prosite for DKFZphutel_22el2 . 1 

PS00006 9->13 CK2_PHOSPHO SITE PDOC00006 

PS00006 26->30 CK2_PHOSPHO~SITE PDOC00006 

PS00006 28->32 CK2_PHOSPHO_SITE PDOC00006 



(No Pfam data available for DKFZphutel_22el2 . 1) 
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DKFZphutel_22n2 



group: uterus derived 

DKFZphutel_22n2 encodes a novel 304 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes. 



unknown 

complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 

Locus: /map="553.3 cR from top of Chrll linkage group" 
Insert length: 1556 bp 

Poly A stretch at pos. 1534, no polyadenylation signal found 



1 ACAACAGGCT GGTTGCTTGG CGTGGAATCC TAAAGTGGCC TGGCTTTGAG 

51 ACTGGAGTGA GACCCCAGCC CTAGGCTGGG GTTCTTTCCA TTATAGAGGA 

101 GACGGATTCA GAAGGGCTAC AGACCAAGGT TGTTGAAAAC CAGACATATG 

151 ATGAGCGTCT AGAGATTAAC GACTCCGAAG AGGTTGCAAG TATTTATACT 

201 CCAACCCCAA GACACCAAGG ACTTCCTCGT TCTGCCCATC TTCCTAACAA 

251 GGCTATGGCT GATAACAGCA GTGATGAGTG TGAAGAGGAA AAT AAC AAGG 

301 AGAAGAAGAA GACCTCACAG TTGACACCTC AACGGGGCTT TAGTGAAAAT 

351 GAGGATGACG ATGATGATGA TGATGATTCA TCTGAAACTG ATTCTGATTC 

401 TGATGATGAT GATGAAGAGC ATGGAGCCCC TCTGGAAGGG GCCTATGACC 

4 51 CTGCAGACTA TGAGCATTTG CCAGTTTCTG CTGAAATTAA GGAACTCTTC 

501 CAGTACATCA GTAGGTACAC ACCTCAGTTG ATTGACCTGG ACCACAAACT 

551 GAAGCCTTTC ATTCCTGATT TTATCCCAGC TGTCGGGGAT ATTGATGCAT 

601 TCTTAAAGGT CCCACGTCCT GATGGAAAGC CTGACAACCT TGGCCTATTG 

651 GTATTGGATG AACCTTCTAC AAAGCAGTCA GACCCTACGG TGCTCTCACT 

701 CTGGTTAACA GAGAATTCTA AGCAGCACAA CATCACACAA CATATGAAAG 

751 TAAAAAGCCT AGAAGATGCA GAAAAGAATC CCAAAGCCAT TGACACGTGG 

801 ATTGAGAGCA TCTCTGAATT ACACCGTTCT AAGCCCCCTG CGACTGTGCA 

851 CTACACCAGG CCCATGCCCG ACATTGACAC GCTGATGCAG GAATGGTCCC 

901 CGGAGTTTGA AGAGCTTTTG GGCAAGGTAA GCCTGCCCAC GGCAGAGATT 

951 GATTGCAGCC TGGCAGAGTA CATTGACATG ATCTGTGCCA TTCTAGACAT 

1001 CCCTGTCTAC AAGAGTCGGA TCCAGTCCCT CCATCTGCTC TTTTCCCTCT 

1051 ACTCAGAATT CAAGAACTCA CAGCATTTTA AAGCTCTCGC TGAAGGCAAG 

1101 AAAGCATTCA CTCCTTCATC CAATTCCACC TCCCAAGCTG GAGACATGGA 

1151 GACATTAACC TTCAGCTGAG ACACTTCCCA AGCTGCTGTT TCAAGGCTGA 

1201 GCTGGCCCCT CTGCCCCAGC TGAGATGGAC AGATCGTTGT CAGCTACTTG 

1251 ATGTCCTTGC CCATGCCACA GCTTGGCTCA GGGGCAGTGC ATGTCCTGCT 

1301 GCCCTCTCTG CCAGAGGGCA CAGAACATGT TTGTTTAATG AACCTGCCTG 

1351 CCTCAGATTG CTGTCCCCGG GGAGTTAATG CATCTACACC ACTGTGGGGA 

1401 TTTGAGTTAT AAGAATTGGA ATTTCTGAGA TCCCATGGAG GTTAGATTGG 

1451 GAGGAAAGCT TAAAAGATGT CCTTTTTGTG AGAGGGATGG AATTGTTTTC 

1501 TTTCATTCGT AAAGTTAGTG AGTAAAGATT TTATAAATCA AAAAAAAAAA 

1551 AAAAAA 



BLAST Results 



Entry HS188252 from database EMBL: 
human STS WI-12265. 
Score = 2554, P *= 4.1e-109, identities = 556/587 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 255 bp to 1166 bp; peptide length: 304 
Category: putative protein 
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1 MADNSSDECE EENNKEKKKT SQLTPQRGFS ENEDDDDDDD DSSETDSDSD 
51 DDDEEHGAPL EGAYDPADYE HLPVSAEIKE LFQYISRYTP QLIDLDHKLK 
101 PFIPDFIPAV GDIDAFLKVP RPDGKPDNLG LLVLDEPSTK QSDPTVLSLW 
151 LTENSKQHNI TQHMKVKSLE DAEKNPKAID TWIESISELH RSKPPATVHY 
201 TRPMPDIDTL MQEWSPEFEE LLGKVSLPTA EIDCSLAEYI DMICAILDIP 
251 VYKSRIQSLH LLFSLYSEFK NSQHFKALAE GKKAFTPSSN STSQAGDMET 
301 LTFS 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel_22n2, frame 3 

PIR:S3B149 SIS2 protein - yeast (Saccharomyces cerevisiae), N «■ 1, 
Score = 132, P = le-05 



>PIR:S38149 SIS2 protein - yeast (Saccharomyces cerevisiae) 
Length = 562 



HSPs: 



Score = 132 (19.8 bits), Expect » 1.0e-05, P - 1.0e-05 
Identities - 24/63 (38%), Positives * 35/63 (55%) 



Query: 3 DNSSDECEEENNKEKKKTSQLTPQRGFSENEDDDDDDDDSSETDSDSDDDDEEHGAPLEG 62 

+ DE EEE++ E++ T +++DDDDDDDD + D D DDD++E A G 

Sbjct: 497 EEDDDEDEEEDDDEEEDTEDKNENNNDDDDDDDDDDDDDDDDDDDDDDDDEDEDEAETPG 556 

Query: 63 AYD 65 
D 

Sbjct: 557 IID 559 



Score = 122 (18.3 bits). Expect - 1.4e-04, P - 1.4e-04 
Identities = 20/52 (38%), Positives = 33/52 (63%) 



Query: 4 NSSDECEEENNKEKKKTSQLTPQRGFSENEDDDDDDDDSSETDSDSDDDDEE 55 

N+ +E ++E+ +E + T + + N+DDDDDDDD + D D DDDD++ 

Sbjct: 4 94 NNEEEDDDEDEEEDDDEEEDTEDKNENNNDDDDDDDDDDDDDDDDDDDDDDD 545 



Pedant information for DKFZphutel_22n2, frame 3 



Report for DKFZphutel_22n2 . 3 



[LENGTH] 304 

[MWJ 34285.85 

[pIJ 4.37 

(PROSITEJ AMIDATION 1 

[PROSITE] CAMP_PHOSPHO_SITE 2 

[PROSITE] CK2_PHOSPHO_SITE 10 

[PROSITE] PKC_PHOSPHO_SITE 1 

[PROSITE] ASN GLYCOSYLATION 3 

[KW] All~Alpha 

[KW] LOW_COMPLEXITY 11.84 % 



SEQ MADNSSDECEEENNKEKKKTSQLTPQRGFSENEDDDDDDDDSSETDSD5DDDDEEHGAPL 

SEG xxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccchhhhhhchhhhhhcccccccccccccccccccccccccccccccccccccccc 

SEQ EGAYDPADYEHLPVSAEIKELFQYISRYTPQLIDLDHKLKPFIPDFIPAVGDIDAFLKVP 

SEG 

PRD ccccccccccccchhhhhhhhhhhhhhhccccccccccccccccccccccccccceeecc 

SEQ RPDGKPDNLGLLVLDEPSTKQSDPTVLSLWLTENSKQHNITQHMKVKSLEDAEKNPKAID 

SEG 

PRD ccccccccceeeeecccccccccccchhhhhhccccccccccccchhhhhhhhcccccch 

SEQ TWIESISELHRSKPPATVHYTRPMPDIDTLMQEWSPEFEELLGKVSLPTAEIDCSLAEYI 

SEG 

PRD hhhhhhhhhhcccccceeeeecccccchhhhhhcccchhhhhccccccccccchhhhhhh 

SEQ -DMICAI LDI PVYKSRIQSLHLLFSLYSEFKNSQHFKALAEGKKAFTPSSNSTSQAGDMET 

SEG 



519 



WO 01/12659 



PCT/IBOO/01496 



PRD hhhhhhhcccchhhhhhhhhhhhhhhhhhhcchhhhhhhhcccccccccccccccccccc 

SEQ LTFS 

SEG 

PRD cccc 



Prosite for DKFZphutel_22n2 . 3 



PS00001 
PS00001 
PS00001 
PS00004 
PS0O004 
PS00005 
PS00006 
PS00006 
P500006 
PS00006 
PS00006 
PS00006 
PS00006 
PSOOOOG 
PS00006 
PS00006 
P500009 



4- >8 
159->163 
290->294 

17- >21 

18- >22 
138->141 

5- >9 
30->34 
43->47 
45->49 
47->51 
49->53 

168->172 
181->185 
185->189 
235->239 
280->2B4 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO SITE 

CAMP_PHOS PHO~SI TE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO SITE 

CK2~PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO_S I TE 

CK2_PHOSPHO_SITE 

AMI DAT ION 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDCK00006 
PDOC00006 
PDOC00006 
PDOC00009 



(No Pfam data available for DKFZphutel_22n2 . 3) 
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DKFZphutel_22o2 
group: uterus derived 

DKFZphutel_22o2 encodes a novel 537 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 

similarity to S.pombe SPBC3E7.03c 
complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 
Locus: map-"llpl5 . 5" 
Insert length: 2714 bp 

Poly A stretch at pos. 2695, polyadenylation signal at pos. 2677 

1 GCAGGGCACG GTGGGGGCTG AGATCGTTTC CTGTTGGAAC TTCTGGCCCA 

51 AGAAGCGCGG GTCACAAGGA GAGGGGTCAG TTCGGTTCAG AGCGACTCAG 

101 CCCCTCGACT CGGGTCTTAA AACCTCCGAG CCGCCAGTTC TGCCTCAGGC 

151 CGCGCCCCCT TAAAGCGCCA CCAGACGCTG CGCCCCGTTA AAGCGCCACC 

201 AGACGCCGCG CCCCGTCCCG GCCTCCCCCG CGCGCTGGCG CGGGGCTTTC 

251 TGGGCCAGGG CGGGGCCGGC GAACTGCGGC CCGGAACGGC TGAGGAAGGG 

• 301 CCCGTCCCGC CTTCCCCGGC GCGCCATGGA GCCCCGGGCG GTTGCAGAAG 

351 CCGTGGAGAC GGGTGAGGAG GATGTGATTA TGGAAGCTCT GCGGTCATAC 

401 AACCAGGAGC ACTCCCAGAG CTTCACGTTT GATGATGCCC AACAGGAGGA 

451 CCGGAAGAGA CTGGCGGAGC TGCTGGTCTC CGTCCTGGAA CAGGGCTTGC 

501 CACCCTCCCA CCGTGTCATC TGGCTGCAGA GTGTCCGAAT CCTGTCCCGG 

551 GACCGCAACT GCCTGGACCC GTTCACCAGC CGCCAGAGCC TGCAGGCACT 

601 AGCCTGCTAT GCTGACATCT CTGTCTCTGA GGGGTCCGTC CCAGAGTCCG 

651 CAGACATGGA TGTTGTACTG GAGTCCCTCA AGTGCCTGTG CAACCTCGTG 

701 CTCAGCAGCC CTGTGGCACA GATGCTGGCA GCAGAGGCCC GCCTAGTGGT 

751 GAAGCTCACA GAGCGTGTGG GGCTGTACCG TGAGAGGAGC TTCCCCCACG 

801 ATGTCCAGTT CTTTGACTTG CGGCTCCTCT TCCTGCTAAC GGCACTCCGC 

851 ACCGATGTGC GCCAGCAGCT GTTTCAGGAG CTGAAAGGAG TGCGCCTGCT 

901 AACTGACACA CTGGAGCTGA CGCTGGGGGT GACTCCTGAA GGGAACCCCC 

951 CACCCACGCT CCTTCCTTCC CAAGAGACTG AGCGGGCCAT GGAGATCCTC 

1001 AAAGTGCTCT TCAACATCAC CCTGGACTCC ATCAAGGGGG AGGTGGACGA 

1051 GGAAGACGCT GCCCTTTACC GACACCTGGG GACCCTTCTC CGGCACTGTG 

1101 TGATGATCGC TACTGCTGGA GACCGCACAG AGGAGTTCCA CGGCCACGCA 

1151 GTGAACCTCC TGGGGAACTT GCCCCTCAAG TGTCTGGATG TTCTCCTCAC 

1201 CCTGGAGCCA CATGGAGACT CCACGGAGTT CATGGGAGTG AATATGGATG 

1251 TGATTCGTGC CCTCCTCATC TTCCTAGAGA AGCGTTTGCA CAAGACACAC 

1301 AGGCTGAAGG AGAGTGTAGC TCCCGTGCTG AGCGTGCTGA CTGAATGTGC 

1351 CCGGATGCAC CGCCCAGCCA GGAAGTTCCT GAAGGCCCAG GGATGGCCAC 

1401 CTCCCCAGGT GCTGCCCCCT CTGCGGGATG TGAGGACACG GCCTGAGGTT 

1451 GGGGAGATGC TGCGGAACAA GCTTGTCCGC CTCATGACAC ACCTGGACAC 

1501 AGATGTGAAG AGGGTGGCTG CCGAGTTCTT GTTTGTCCTG TGCTCTGAGA 

1551 GTGTGCCCCG ATT CATC AAG TACACAGGCT ATGGGAATGC TGCTGGCCTT 

1601 CTGGCTGCCA GGGGCCTCAT GGCAGGAGGC CGGCCCGAGG GCCAGTACTC 

1651 AGAGGATGAG GACACAGACA C AG AT GAG T A CAAGGAAGCC AAAGCCAGCA 

1701 TAAACCCTGT GACCGGGAGG GTGGAGGAGA AGCCGCCTAA CCCTATGGAG 

1751 GGCATGACAG AGGAGCAGAA GGAGCACGAG GCCATGAAGC TGGTGACCAT 

1801 GTTTGACAAG CTCTCCAGGA ACAGAGTCAT CCAGCCAATG GGGATGAGTC 

1851 CCCGGGGTCA TCTTACGTCC CTGCAGGATG CCATGTGCGA GACTATGGAG 

1901 CAGCAGCTCT CCTCGGACCC TGACTCGGAC CCTGACTGAG GATGGCAGCT 

1951 CTTCTGCTCC CCCATCAGGA CTGGTGCTGC TTCCAGAGAC TTCCTTGGGG 

2001 TTGCAACCTG GGGAAGCCAC ATCCCACTGG ATCCACACCC GCCCCCACTT 

2051 CTCCATCTTA GAAACCCCTT CTCTTGACTC CCGTTCTGTT CATGATTTGC 

2101 CTCTGGTCCA GTTTCTCATC TCTGGACTGC AACGGTCTTC TTGTGCTAGA 

2151 ACTCAGGCTC AGCCTCGAAT TCCACAGACG AAGTACTTTC TTTTGTCTGC 

2201 GCCAAGAGGA ATGTGTTCAG AAGCTGCTGC CTGAGGGCAG GGCCTACCTG 

2251 GGCACACAGA AGAGCATATG GGAGGGCAGG GGTTTGGGTG TGGGTGCACA 

2301 CAAAGCAAGC ACCATCTGGG ATTGGCACAC TGGCAGAGCC AGTGTGTTGG 

2351 GGTATGTGCT GCACTTCCCA GGGAGAAAAC CTGTCAGAAC TTTCCATACG 

2401 AGTATATCAG AACACACCCT TCCAAGGTAT GTATGCTCTG TTGTTCCTGT 

2451 CCTGTCTTCA CTGAGCGCAG GGCTGGAGGC CTCTTAGACA TTCTCCTTGG 

2501 TCCTCGTTCA GCTGCCCACT GTAGTATCCA CAGTGCCCGA GTTCTCGCTG 

2551 GTTTTGGCAA TTAAACCTCC TTCCTACTGG TTTAGACTAC ACTTACAACA 

2601 AGGAAAATGC CCCTCGTGTG ACCATAGATT GAGATTTATA CCACATACCA 

2651 CACATAGCCA CAGAAACATC ATCTTGAAAT AAAGAAGAGT TTTGGACAAA 
2701 AAAAAAAAAA AAAA 
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BLAST Results 



Entry AF015416 from database EMBL: 

Homo sapiens chromosome 11 from llpl5.5 region, complete sequence. 
Score « 3356, P - 2.0e-144, identities = 672/673 

Entry HS2 63253 from database EMBL: 
human STS SHGC-15914. 
Score = 1143, P = 9.0e-46, identities - 245/255 



Medline entries 

No Medline entry 



Peptide information for frame 2 



ORF from 326 bp to 1936 bp; peptide length: 537 
Category: similarity to unknown protein 



1 MEPRAVAEAV ETGEEDVIME ALRSYNQEHS QSFTFDDAQQ EDRKRLAELL 
51 VSVLEQGLPP SHRVIWLQSV RILSRDRNCL DPFTSRQSLQ ALACYADISV 
101 SEGSVPESAD MDVVLESLKC LCNLVLSSPV AQMLAAEARL WKLTERVGL 
151 YRERSFPHDV QFFDLRLLFL LTALRTDVRQ QLFQELKGVR LLTDTLELTL 
201 GVTPEGNPPP TLLPSQETER AMEILKVLFN ITLDSIKGEV DEEDAALYRH 
251 LGTLLRHCVM IATAGDRTEE FHGHAVNLLG NLPLKCLDVL LTLEPHGDST 
301 EFMGVNMDVI RALLIFLEKR LHKTHRLKES VAPVLSVLTE CARMHRPARK 
351 FLKAQGWPPP QVLPPLRDVR TRPEVGEMLR NKLVRLMTHL DTDVKRVAAE 
401 FLFVLCSESV PRFIKYTGYG NAAGLLAARG LMAGGRPEGQ YSEDEDTDTD 
451 EYKEAKASIN PVTGRVEEKP PNPMEGMTEE QKEHEAMKLV TMFDKLSRNR 
501 VIQPMGMSPR GHLTSLQDAM CETMEQQLSS DPDSDPD 

BLAST P hits 

No BLAST p hits available 

Alert BLASTP hits for DKFZphutel_22o2, frame 2 

TREMBL:SPBC3E7_3 gene: "SPBC3E7 . 03c" ; product: "hypothetical protein"; 
S.pombe chromosome II cosmid c3E7 . , N = 1, Score = 112, P = 0.0023 

>TREMBL: SPBC3E7_3 gene: "SPBC3E7 . 03c M ; product: "hypothetical protein" 
S.pombe chromosome II cosmid c3E7. 
Length - 362 

HSPs: 

Score = 112 (16.8 bits), Expect « 2.3e-03, P = 2.3e-03 
Identities « 71/289 (24%), Positives = 124/289 (42%) 



SQ+ E + EIL++LF 1+ S E DE+ L L+ + + 

SQDNEMVLTEILRLLFPISKRSYLKEEDEQKILL LVIEIWASSLNNNPNSPLRW 65 

HAVN-LLG-NLPLKCLDVLLTLEPHGDSTEFMGVNMDVI RALLI FLEKRLHKTH RL 327 

HA N LL NL L LD + + T + +1 + +LEK L+ + 



+ ++ P+L++L + +LPDR++G+R L+RL 

QNTLPPILAILLSLLSFFNIKQNL SMLLFPTNDDRKQSLQKGKSFRCLLLRL 173 

MT-HLDTDVKRVAAEFLFVLCSESVPRFIKYTGYGNAAGLLAARGLMAGGRPEGQYS 442 

+T++ ALLC + + GGAG+ M P+ + 

LTIPIVEPIGTYYASLLNELCDGDSQQIARIFGAGYAMGISQHSETMPFPSPLSKAASPV 233 

-EDEDTDTDEYKEAKASINPVTGRV — EEKPPNPMEGMTEEQKEHEAMKLVTMFDKLSRN 499 
+ + +E +I+P+TG + +E +++E+KE EA +L +F +L +N 



Query: 


215 


Sbjct: 


12 


Query: 


274 


Sbjct: 


66 


Query: 


328 


Sbjct: 


122 


Query: 


387 


Sbjct: 


174 


Query: 


443 


Sbjct: 


234 
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Query: 500 RVIQ 503 
IQ 

Sbjct: 293 STIQ 296 



Pedant information for DKFZphutel_22o2, frame 2 
Report for DKFZphutel_22o2 . 2 



[LENGTH] 537 

(MW] 60372.53 

[pi] 5.20 

[BLOCKS] BL00415L Synapsins proteins 

[PROSITE] MYRISTYL 4 

[PROSITE] CK2_PHOSPHO_SITE 13 

[PROSITE] PKC_PHOSPHO_SITE 10 

[PROSITE] ASNJ3LYCOSYLATION 1 

[KW] All_Alpha 

[KW] LOW COMPLEXITY 9.50 % 



SEQ MEPRAVAEAVETGEEDVIMEALRSYNQEHSQSFTFDDAQQEDRKRLAELLVSVLEQGLPP 

SEG 

PRD ccchhhhhhhhhccchhhhhhhhhhccccccceeeccchhhhhhhhhhhhhhhhhccccc 

SEQ SHRVIWLQSVRILSRDRNCLDPFTSRQSLQALACYADISVSEGSVPESADMDVVLESLKC 

SEG 

PRD cceeeeeccccccccccccccccchhhhhhhhhhhhceeeeccccccccchhhhhhhhhh 

SEQ LCNLVLSSPVAQMLAAEARLVVKLTERVGLYRERSFPHDVQFFDLRLLFLLTALRTDVRQ 

SEG xxxxxxxxxxxxxxx, . . 

PRD hhhhccccchhhhhhhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhhhhhhh 

SEQ QLFQELKGVRLLTDTLELTLGVTPEGNPPPTLLPSQETERAMEILKVLFNITLDSIKGEV 

SEG 

PRD hhhhhhchhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhhhhhhccccchhh 

SEQ DEEDAALYRHLGTLLRHCVMI ATAGDRTEEFHGH AVNLLGNLPLKCLDVLLTLEPHGDST 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhccccccccccccceeeeecccccccceeeeeeeccccccc 

SEQ E FMG VNMD V I RALL I FLEKRLHKTHRLKESVAPVLSVLTECARMHRPARKFLKAQGWPPP 

SEG 

PRD eeeehhhhhhhhhhhhhhhhhhhhhhccccceeeehhhhhhhhhchhhhhhhhccccccc 

SEQ QVLPPLRDVRTRPEVGEMLRNKLVRLMTHLDTDVKRVAAEFLFVLCSESVPRFI KYTGYG 

SEG xxx 

PRD cccccccccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhcccccceeeecccc 

SEQ NAAGLLAARGLMAGGRPEGQYSEDEDTDTDEYKEAKASINPVTGRVEEKPPNPMEGMTEE 

SEG xxxxxxxxxxxxxxx xxxxxxxxx 

PRD chhhhhhhhhccccccccccccccccccchhhhhhhhhccccccceeecccccccchhhh 

SEQ QKEHEAMKLVTMFDKLSRNRVIQPMGMSPRGHLTSLQDAMCETMEQQLSSDPDSDPD 

SEG xxxxxxxxx 

PRD hhhhhhhhhhhhhhhcccccccccccccccccchhhhhhhhhhhhhhhhcccccccc 



Prosite for DKFZphutel_22o2 . 2 



PS00001 


230->234 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00005 


61->64 


PKC" 


PHOSPHO 


SITE 


PDOC00005 


PS00005 


69->72 


PKC" 


PHOSPHO" 


"site 


PDOC00005 


PS00005 


84->87 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


117->120 


PKC" 


'PHOSPHO" 


"site 


PDOC00005 


PS00005 


145->148 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


218->221 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


235->238 


PKC~ 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


324->327 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


463->466 


PKC~PHOSPHO~SITE 


PDOC00005 


PS00005 


508->511 


PKC 


PHOSPHO 


site 


PDOC00005 


PS00006 


12->16 


CK2~ 


"PHOSPHO" 


'site 


PDOC00006 


PS00006 


34->38 


CK2 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


52->56 


CK2" 


"PHOSPHO" 


|SITE 


PDOC00006 


PS00006 


99->103 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


104->108 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


263->267 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


371->375 


CK2~ 


"PHOSPHO" 


[site 


PDOC00006 
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PS00006 


388- 


>392 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


442- 


>446 


CK2""PHOSPHO - SITE 


PDOC00006 


PS00006 


447- 


>451 


CK2~PHOSPHO""SITE 


PDOC00006 


PS00006 


491- 


>495 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


515- 


>519 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


530- 


>534 


CK2 PHOSPHO" 


>ITE 


PDOC00006 


PS00008 


57 


->63 


MYRISTYL 




PDOC00008 


PS00008 


420- 


>426 


MYRISTYL 




PDOC00008 


PS00008 


424- 


>430 


MYRISTYL 




PDOC00008 


PS00008 


430- 


>436 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphutel_22o2.2) 
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DKF2phutel_23el3 



group: metabolism 

DKFZphtes3_15jl8 encodes a novel 148 amino acid protein with similarity to 27K heat shock 
proteins . 

The novel protein contains a serine protease of the subtilase family with an aspartic acid- 
containing active site. Subtilases are an extensive family of serine proteases whose catalytic 
activity is provided by a charge relay system similar to that of the trypsin family of serine 
proteases but which evolved by independent convergent evolution. The sequence around the 
residues involved in the catalytic triad (aspartic acid, serine and histidine) are completely 
different from that of the analogous residues in the trypsin serine proteases. Thus the novel 
protein is a new member of this family. 

The new protein can find application in modulation of proteinase activity in cells and as a 
new enzyme for proteomics and biotechnologic production processes. 



heat shock protein HSP27 

strong similarity to heat shock 27K proteins 
complete cDNA, complete cds, EST hits 
Sequenced by EMBL 

Locus: /map="578.9 cR from top of Chrl2 linkage group" 
Insert length: 1854 bp 

Poly A stretch at pos. 1831, polyadenylation signal at pos. 1810 



1 GGTTTATTAA GCTCCTGGCT 
51 AGCCTGGGCA GCCTGGGAAG 
101 GTGAGGCAGT GCGGACGGGG 
151 GGGGTTACCT TTGGGGGCTG 
201 TGGCAGTGGT TGGTTCTGCT 
251 GCTGAAGAAT AAGCTAGCCC 
301 CAGGTGGTTC TGTCTCTCTG 
351 ACCATGGCTG ACGGTCAGAT 
401 GCGCCGAGAC CCCTTCCGGG 
4 51 ATGGCTTTGG CATGGACCCC 
501 GACTGGGCTC TGCCTCGTCT 
551 GGGCATGGTG CCCCGGGGCC 
601 CCGAGGGCAG GACCCCCCCA 
651 GTGAATGTGC ACAGCTTCAA 
701 TGGATACGTG GAGGTGTCTG 
751 GCATTGTTTC TAAGAACTTC 
801 GATCCTGTGA CAGTATTTGC 
851 CGAAGCTCCC CAGGTCCCTC 
901 ACAACGAGCT TCCCCAGGAC 
951 AGTACTGGCC CATCCTTGTT 
1001 CAGGATACAT TACTTTAGCT 
1051 GAGGGTGCGG GGGTGAGGAC 
1101 TAGATTTCTC CACAGGATAG 
1151 AGGCCAAAAT ACTAGTTTTG 
1201 TGTTGCACAT TCTATAGTTG 
1251 ACGTTGTATC TTACTTGCAG 
1301 CTCCCCCATC ACCCAGGTTC 
1351 CAAACCATGC CGCATGGTTT 
1401 GTGCTTCCAC ATGCCTGGCC 
1451 CCATATGGAA TTTATCCATC 
1501 CCTCTGCCCA GATGTGTCCA 
1551 CCCTAAGGAC GCTGGGAGCC 
1601 CTTTCTTCTG TCCCCTGTGT 
1651 CTCCAGACAG CTCCATCAGG 
1701 TAGGCTAGTG GTATTGTGTA 
1751 TGAGTTATGC TGTTGTTTAG 
1801 TAATAATAAT AATAAAGGAG 
1851 AAAA 



CCGCTCTAGA CCTCAGCGGT TCTGGCTGCC 
CCTGGGAGGA CGGTGGCTTG CCGGTCTGTC 
ACCCTCTGGG ATTCTGCTGG ATCTGCCCCG 
GGACCCCAGT CGAGGGGACA CAACCGTCCC 
TCTCCCTGCA GAAAAGCAGC ATTTTCGGAA 
AGCCACACCA CCTTGTTGTG TGACCTTGGG 
AGCCTCTGTT TCTCTCTGAG CTGAGCAGCC 
GCCCTTCTCC TGCCACTACC CAAGCCGCCT 
ACTCTCCCCT CTCCTCTCGC CTGCTGGATG 
TTCCCAGACG ACTTGACAGC CTCTTGGCCC 
CTCCTCCGCC TGGCCAGGCA CCCTAAGGTC 
CCACTGCCAC CGCCAGGTTT GGGGTGCCTG 
CCCTTCCCTG GGGAGCCCTG GAAAGTGTGT 
GCCAGAGGAG TTGATGGTGA AGACCAAAGA 
GCAAACATGA AGAGAAACAG CAAGAAGGTG 
ACAAAGAAAA TCCAGCTTCC TGCAGAGGTG 
CTCACTTTCC CCAGAGGGTC TGCTGATCAT 
CTTACTCAAC ATTTGGAGAG AGCAGTTTCA 
AGCCAGGAAG TCACCTGTAC CTGAGATGCC 
TTGTCCCCAA CCCTAGGGCT TCTCTGATTC 
GAACTCAGAT TTAGTGCAAG TAAAATGTTA 
TGACCACAGA TTCCCTGGAT AGTGTAGTGG 
CGCAATTGGC AAATCATGCT TGGTTGTGTT 
CTTTCTTTAC CTTTTCTATC TTGATGAAAA 
CAAAACACAT AAAAGGGGAC TTAACATTTC 
TGAATGCAAG GGTTACTTTT CTCTGGGGAC 
CTACTCTGGG CTCCCGATTC CCATGGCTCC 
GGTTAATGAA ACCCAGTAGC TAACCCCACT 
TAAAATGGGT GATATACAGG TCTTATATCC 
AACCACATAA AAACAAACAG TGCCTTCTGC 
GCACGTTCTC AAAGTTTCCA CATTAGCACT 
TGTCAGTTTA TGATCTGACC TAGGTCCCCC 
TTAAGTCGGG ATTTTTACAG AGGGAGCTGT 
AACCAAGCAA AGGCCAGATA GCCTGACAGA 
TATGGGCGGG ACGTGTGTGT CATTATTATT 
GGGTAAATAA CAGTAAATAA TTAATAATAA 
CTGACGTTCT TAAAAAAGAA AAAAAAAAAA 



BLAST Results 



Entry HS286348 from database EMBL: 
human STS TIGR-A002 J47 . 
Score « 510, P - 1.2e-16, identities = 102/102 
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Medline entries 



95394379: 

Cloning and sequencing of a cDNA encoding the canine HSP27 protein. 
94110260: 

Physiological and pathological changes in levels of the two 
small stress proteins, HSP27 and alpha B crystallin, in rat 
hindlimb muscles 



Peptide information for frame 3 



ORF from 354 bp to 941 bp; peptide length: 196 
Category: strong similarity to known protein 
Prosite motifs: SUBTILASE ASP (28-39) 



1 MADGQMPFSC HYPSRLRRDP FRDSPLSSRL LDDGFGMDPF PDDLTASWPD 
51 WALPRLSSAW PGTLRSGMVP RGPTATARFG VPAEGRTPPP FPGEPWKVCV 
101 NVHSFKPEEL MVKTKDGYVE VSGKHEEKQQ EGGIVSKNFT KKIQLPAEVD 
151 PVTVFASLSP EGLLI IEAPQ VPPYSTFGES SFNNELPQDS QEVTCT 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_23el3, frame 3 

PIR:JC4244 heat-shock 27K protein - dog, N = 1, Score « 304, P = 
4.3e-27 

PIR:JN0924 heat shock 27 protein - rat, N = 1, Score = 301, P - 8.9e-27 

TREMBL:MM03561_1 product: "heat shock protein HSP27"; Mus muaculus 
heat shock protein HSP27 internal deletion variant b mRNA, complete 
cds., N « 1, Score = 301, P - 8.9e-27 



>PIR:JC4244 heat-shock 27K protein - dog 
Length = 209 

HSPs: 



Score = 304 (45.6 bits), Expect = 4.3e-27, P = 4.3e-27 
Identities = 80/182 (43%), Positives « 102/1B2 (56%) 



Query: 


1 


MADGQMPFSC-HYPSRLRRDPFRD-SPLSSRLLDDGFGMDPFPDDLTASWPDWALPRLSS 


58 




M + ++PFS PS DPFRD P SRL D FG+ P++ WW S 




Sbjct: 


1 


MTERRVPFSLLRSPSW DPFRDWYPAHSRLFDQAFGLPRLPEE WAQWFG HS 


50 


Query: 


59 


AWPGTLRSGMV P RGPT AT ARFGV P AEGR — TPPPFPG EPWKVCVNVHSF 


105 




WPG +R +P GP A A PA R + G + W+V ++V+ F 




Sbjct: 


51 


GWPGYVRP — IPPAVEGPAAAAAAAAPAYSRALSRQLSSGVSEIRQTADRWRVSLDVNHF 


108 


Query: 


106 


KPEELMVKTKDGYVEVSGKHEEKQQEGGIVSKNFTKKIQLPAEVDPVTVFASLSPEGLLI 


165 






PEEL VKTKDG VE++GKHEE+Q E G +S+ T K LP VDP V +SLSPEG L 




Sbjct: 


109 


APEELTVKTKDGVVEITGKHEERQDEHGYISRRLTPKYTLPPGVDPTLVSSSLSPEGTLT 


168 


Query: 


166 


IEAPQVPPYSTFGE 179 








+EAP P + E 




Sbjct: 


169 


VEAPMPKPATQSAE 182 





Pedant information for DKFZphutel_23el3, frame 3 



Report for DKFZphutel_23el3.3 



[LENGTH] 196 

[MW] 21604.37 
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[pi] 


5.00 


[HOMOL] 


PIR:JC4244 heat-shock 27K protein - dog 3e-22 


[BLOCKS] 


BL01031C 


[PIRKW] 


blocked amino end le-13 


[PIRKW] 


acetylated amino end 4e-13 


[PIRKW] 


phosphoprotein 7e-2l 


[PIRKW] 


glycoprotein 2e-ll 


[PIRKW] 


heat shock 7e-2l 


[PIRKW] 


molecular chaperone 4e-13 


[ PIRKW] 


alternative splicing le— 19 


[PIRKW] 


eye lens 6e-14 


[PIRKW] 


stress-induced protein 7e-21 


[SUPFAM] 


alpha-crystallin 7e-21 


[PROSITEJ 


SUBTILASE ASP 1 


[PROSITE] 


MYRISTYL 2 


[PROSITE] 


CK2 PHOSPHO SITE 2 


(PROSITE] 


PKC PHOSPHO SITE 6 


[PROSITE] 


ASN_GLYCOSYLATION 1 


[PFAM] 


Heat shock hsp20 proteins 


tKW] 


All Beta 


IKW] 


LOW_COMPLEXITY 7.14 % 



SEQ MADGQMPFSCHYPSRLRRDPFRDSPLSSRLLDDGFGMDPFPDDLTASWPDWALPRLSSAW 

SEG xxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccchhhhhcccccccccccccccccccccccccccc 

SEQ PGTLRSGMVPRGPTATARFGVPAEGRTPPPFPGEPWKVCVNVHSFKPEELMVKTKDGYVE' 

SEG 

PRD cccccccccccccchhhhhhhhccccccchhhhhhheeeeeecccccceeeeecccceee 

SEQ VSGKHEEKQQEGGIVSKNFTKKIQLPAEVDPVTVFASLSPEGLLIIEAPQVPPYSTFGES 

SEG 

PRD eccchhhhhcccceeeeccccccccccccccceeeecccccceeeeeccccccccccccc 

SEQ SFNNELPQDSQEVTCT 

SEG 

PRD cccccccccceeeccc 



Prosite for DKFZphutel_23el3 .3 



PS00001 138->142 ASN_GLYCOSYLATION PDOC00001 

PS00005 27->30 PKC_PHOSPHO SITE PDOC00005 

PS00005 63->66 PKC_PHOSPHO~SITE PDOC00005 

PS00005 76->79 PKC_PHOSPHO_SITE PDOC00005 

PS00005 104->107 PKC_PHOSPHO_SITE PDOC00005 

PS00005 122->125 PKC_PHOSPHO_SITE PDOC00005 

PS00005 140->143 PKC_PHOSPHO_SITE PDOC00005 

PS00006 47->51 CK2_PHOSPHO_SITE PDOC00006 

PS00006 176->180 CK2_PHOSPHO_SITE PDOC00006 

PS00008 62->68 MYRISTYL PDOC0000B 

PS00008 132->138 MYRISTYL PDOC0000B 

PS00136 28->39 SUBTILASE ASP PDOC00125 



Pfam for DKFZphutel_23el3.3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Heat shock hsp20 proteins 



77 



* AMMrpPWDWRE DpDHFeVrMDMPGFKPEEI KVkVEDNNVLvIeG 

A P++ R + ++V++++ FKPEE+ VK+ D+ +++++G 

ARFGVPAEGR-TPPPFPGEPWKVCVNVHSFKPEELMVKTKDG-YVEVSG 123 



EHEREEEREDDkWWWHERIYRHFMRRFrLPENVDpDqlkAsMSdNGVLTI 
+HE E++ + + ++ F +++LP +VDP + AS+S++G+L I 

124 KHE EKQQ EGGIVSKNFTKKIQLPAEVDPVTVFASLSPEGLLII 166 

TVPKpEP* 
++P ++P 
167 EAPQVPP 173 
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DKFZphutel_23gll 



group: uterus derived 

DKFZphutel_23gll encodes a novel 256 amino acid protein with similarity to S.pombe 
SPAC31G5.12c and S. cerevisiae Maflp. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 



similarity to SPAC31G5.12c and Maflp 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 1674 bp 

Poly A stretch at pos. 1664, polyadenylation signal at pos. 1644 



1 GGGGGAGGCG GAGGTCGCTC GCTCGCTCGC TCGGCTCGCT GACTCGCCGG 

51 AGCGCTCTGT GGCGGTCGGC GGCAGGTCGG TCGCGAGAGC GGGCTCTGTG 

101 GAAGGGGGCG AGGCTATGTC GCGGTGGCAG CCCGGATGGG CCGGCAGGGC 

151 CGGGAGTAAC GGGACGTCGC CGCGGAGCTT CTTCCCCCGG ATACAGTGCG 

201 GCCCGAGCGG AGGCCGCGGC GCCGCCCTCC GATCTTGAAG AGCCCGCGCT 

251 GCGCGGAGCC CGCCCCCGCC TGCGCACCGG CACCGACGCG GAGCGACCAG 

301 CCCAGCCAGA CCCGGCCCGG CGCGGCCTGA TCTAACCCAG CCAGGCAGGC 

351 AATACTAGCC CCTCTGGAGC ACGGAGCTCC TTCCCCAAAG ACATGAAGCT 

401 ATTGGAGAAC TCGAGCTTTG AAGCCATCAA CTCACAGCTG ACTGTGGAGA 

451 CCGGAGATGC CCACATCATT GGCAGGATTG AGAGCTACTC ATGTAAGATG 

501 GCAGGAGACG ACAAACACAT GTTCAAGCAG TTCTGCCAGG AGGGCCAGCC 

551 CCACGTGCTG GAGGCACTTT CTCCACCCCA GACTTCAGGA CTGAGCCCCA 

601 GCAGACTCAG CAAAAGCCAA GGCGGTGAGG AGGAGGGCCC CCTCAGTGAC 

651 AAGTGCAGCC GCAAGACCCT CTTCTACCTG ATTGCCACGC TCAATGAGTC 

701 CTTCAGGCCT GACTATGACT TCAGCACAGC CCGCAGCCAT GAGTTCAGCC 

751 GGGAGCCCAG CCTTAGCTGG GTGGTGAATG CAGTCAACTG CAGTCTGTTC 

801 TCAGCTGTGC GGGAGGACTT CAAGGATCTG AAACCACAGC TGTGGAACGC 

851 GGTGGACGAG GAGATCTGCC TGGCTGAATG TGACATCTAC AGCTATAACC 

901 CAGACTTGGA CTCAGATCCC TTCGGGGAGG ATGGTAGCCT CTGGTCCTTC 

951 AACTACTTCT TCTACAACAA GCGGCTCAAG CGAATCGTCT TCTTTAGCTG 

1001 CCGTTCCATC AGTGGCTCCA CCTACACACC CTCAGAGGCA GGCAACGAGC 

1051 TGGACATGGA GCTGGGGGAG GAGGAGGTGG AGGAAGAAAG CAGAAGCAGG 

1101 GGCAGTGGGG CCGAGGAGAC CAGCACCATG GAGGAGGACA GGGTCCCAGT 

1151 GATCTGTATT TGATGAGGAG GAGCCGAGGC CCCAGCTTCA TCCAGCTTCA 

1201 ACCAATGCCT GGACCTGTCC ACCTGAGAGG CCCCTGGGGC CTCCCCAGCT 

1251 GCTGGCCAGA CCCTGGCGCT GCCACAGTCC TGGCACTGCC CAAGGCCATA 

1301 CCTGCCTAGC CCTTTGGCTC CATCCTGTGG ATGCCCACTC ACCCCTCAGA 

1351 CTCCTGCTGC CCATGCTGTG GCCGGACTTG TCAGCAGGGG GCCTGGTGGG 

1401 AGGAGCGACT GCCCTGCCCA AATGAACTGC CACAGCAGGG ACAGCTGGAC 

1451 CGCAGAGTTT ATTTTTGTAT TTCTACTGGG CCTGCACACT CCAGCCCAAA 

1501 GGGTCTGTGG CCGGAGGCCC CACGAGCAGG CCCCAGCAGT CACCGGCTCT 

1551 GGTCTTGGGC CGGCCCCGGT GCCCACCTGT ACCCCCACCT CGCCCATTTG 

1601 GCCGCGTGCA CTGAGTGTCA CTTTGCTGCA GCTCGTTTCT TTCCAATAAA 

1651 AGTTTCTGTG ACTTAAAAAA AAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 393 bp to 1160 bp; peptide length: 256 
Category: similarity to known protein 
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1 MKLLENSSFE AINSQLTVET GDAHIIGRIE SYSCKMAGDD KHMFKQFCQE 
51 GQPHVLEALS PPQTSGLSPS RLSKSQGGEE EGPLSDKCSR KTLFYLIATL 
101 NESFRPDYDF STARSHEFSR EPSLSWVVNA VNCSLFSAVR EDFKDLKPQL 
151 WNAVDEEICL AECDI YSYNP DLDSDPFGED GSLWSFNYFF YNKRLKRIVF 
201 FSCRSISGST YTPSEAGNEL DMELGEEEVE EESRSRGSGA EETSTMEEDR 
251 VPVICI 

BLASTP hits 
Entry SPAC31G5_12 from database TREMBL: 

gene: "SPAC31G5 . 12c"; product: "hypothetical protein"; S.pombe 
chromosome I cosmid c31G5. 

Score - 272, P - 9.3e-24, identities « 51/127, positives = 80/127 
Entry SPD656_1 from database TREMBL: 

product: "ORF N150"; Yeast DNA for bfr2+ protein/padl+ protein/sksl+ 
protein, ORF N313, ORF N150, complete cds, and for ORF N118, partial 
cds . 

Score « 263, P = 8.4e-23, identities - 50/127, positives - 79/127 
Entry S50986 from database PIR: 

MAF1 protein - yeast (Saccharomyces cerevisiae} >SWISSPROT : MAF1_YEAST 
MAF1 PROTEIN. >TREMBL : SCI 94 92_1 gene: "MAF1" ; product: "Maflp"; - 
Saccharomyces cerevisiae Maflp (MAF1) gene, complete cds. 
>TREMBL:SC8119_11 gene: "MAFlp"; product: "Maflp"; S. cerevisiae 
chromosome IV cosmid 8119. 

Score = 180, P * 2.3e-17, identities * 43/133, positives » 75/133 

Entry AF098499_2 from database TREMBL: 

gene: "C4 3H8.2" ; Caenorhabditis elegans cosmid C43H8. 

Score = 263, P = 9.2e-23, identities = 78/252, positives = 118/252 



Alert BLASTP hits for DKFZphutel_23gll, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_23gll, frame 3 

Report for DKFZphutel_23gll . 3 

[LENGTH] 256 

[MW] 28869.95 

[pU 4.51 

[HOMOL] TREMBL :SPAC31G5_12 gene: "SPAC31G5. 12c"; product: "hypothetical protein"; 

S.pombe chromosome I cosmid c31G5. 4e-23 

[FUNCATJ 06.04 protein targeting, sorting and translocation [S. cerevisiae, YDROOScJ 

6e-13 

[PROSITE] MYRISTYL 3 

[PROSITE] CK2_PHOSPHO_SITE 5 

[PROSITE] PKC_PHOSPHO_SITE . 6 

[PROSITE] ASN GLYCOSYLATION 3 



[KW] All_Alpha 

[KW] LOW_COMPLEXITY 7.81 % 



SEQ MKLLENSSFEAINSQLTVETGDAHI IGRIESYSCKMAGDDKHMFKQFCQEGQPHVLEALS 

SEG 

PRD cccccchhhhhhhhhhhhccccceeeeecccchhhhhccchhhhhhhhhcccceeeeccc 

SEQ PPQTSGLSPSRLSKSQGGEEEGPLSDKCSRKTLFYLIATLNESFRPDYDFSTARSHEFSR 

SEG 

PRD cccccccccccccccccccccccccccchhhhhhhhhhhhcccccccccccccccccccc 

SEQ EPSLSWWNAVNCSLFSAVREDFKDLKPQLWNAVDEEICLAECDIYSYNPDLDSDPFGED 

SEG 

PRD ccccccchhhhhhhhhhhhhchhhhhhhhhhhhhhhhccccccceeeccccccccccccc 

SEQ GSLWSFNYFFYNKRLKRIVFFSCRSISGSTYTPSEAGNELDMELGEEEVEEESRSRGSGA 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccceeeceeechhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhccccccc 

SEQ EETSTMEEDRVPVICI 

SEG XX 

PRD cccccccccceeeccc 
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prosite for DKFZphutel_23gll .3 



PS00001 
PSO0O0X 
PSO0O01 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 



6- >10 
101->105 
132->136 

33->36 
85->88 
89->92 
103->106 
112->115 
202->205 

7- >ll 
99->103 

212->216 

238- >242 
244->248 

66->72 
181->187 

239- >245 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GL YCOS YLAT I ON 

PKC PHOSPHO_SITE 

PKC~PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHcTsiTE 

PKC PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO SITE 

CK2~PHOSPHO"SITE 

CK2~PHOSPHO~SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC0000S 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphutel_23gll . 3) 
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DKFZphutel_24cl9 



group: transmembrane protein 

DKFZphutel 24cl9 encodes a novel 195 amino acid protein without similarity to known proteins. 
The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes and as a new marker for uterine cells. 

unknown 

membrane regions: 1 

Summary DKFZphutel - 24cl9 encodes a novel 195 amino acid protein, with 
no similarity to known proteins. 



unknown 

complete cDNA, complete cds, EST hits 
TRANSMEMBRANE 1 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 769 bp 

Poly A stretch at pos. 746, polyadenylation signal at pos. 735 



1 ACGAGTCAGC CAAAGATGGC TGCGCCCAGG TAATTTGAGC AAAGGCCACA 

51 GTGAACTCCG GCGTGGCTGA GGAAGACCGG AGGAGGCACC CACAGGCTGC 

101 TGGGAGGAGA GCATAAGGCT CAAAATGGAA AATCATAAAT CCAATAATAA 

151 GGAAAACATA ACAATTGTTG ATATATCCAG AAAAATTAAC CAGCTTCCAG 

201 AAGCAGAAAG GAATCTACTT GAAAATGGAT CGGTTTATGT TGGATTAAAT 

251 GCTGCTCTTT GTGGCCTCAT AGCAAACAGT CTTTTTCGAC GCATCTTGAA 

301 TGTGACAAAG GCTCGCATAG CTGCTGGCTT ACCAATGGCA GGGATACCTT 

351 TTCTTACAAC AGACTTAACT TACAGATGTT TTGTAAGTTT TCCTTTGAAT 

401 ACAGGTGATT TGGATTGTGA AACCTGTACC ATAACACGGA GTGGACTGAC 

451 TGGTCTTGTT ATTGGTGGTC TATACCCTGT TTTCTTGGCT ATACCTGTAA 

501 ATGGTGGTCT AGCAGCCAGG TATCAATCAG CTCTGTTACC ACACAAAGGG 

551 AACATCTTAA GTTACTGGAT TAGAACTTCT AAGCCTGTCT TTAGAAAGAT 

601 GTTATTTCCT ATTTTGCTCC AGACTATGTT TTCAGCATAC CTTGGGTCTG 

651 AACAATATAA ACTACTTATA AAGGCCCTTC AGTTATCTGA ACCTGGCAAA 

701 GAAATTCACT GATTTTAAAC AAATATGTAA ACAAAAATAA AATGGTAAAA 

751 ACAAAAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 125 bp to 709 bp; peptide length: 195 
Category: putative protein 



1 MENHKSNNKE NITIVDISRK INQLPEAERN LLENGSVYVG LNAALCGLIA 
51 NSLFRRILNV TKARIAAGLP MAGIPFLTTD LTYRCFVSFP LNTGDLDCET 
101 CTITRSGLTG LVIGGLYPVF LAI PVNGGLA ARYQSALLPH KGNILSYWIR 
151 TSKPVFRKML FPILLQTMFS AYLGSEQYKL LIKALQLSEP GKEIH 

BLASTP hits 

No BLASTP hits available 
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Alert BLAST P hits for DKFZphutel_24cl9, frame 2 
No Alert BLAST P hits found 

Pedant information for DKFZphutel_24cl9, frame 2 

Report for DKFZphutel_24cl9.2 



[LENGTH] 

(MW] 

tpU 

[PROSITEJ 
(PROSITE) 
1 PROS I TE) 
[ PROSITEJ 
IKW) 



195 

21527.45 
9.36 

MYRISTYL 6 
CK2_PHOSPHO_SITE 
PKC PHOSPHO_SITE 
ASN~*GL YCOS YLAT I ON 
TRANSMEMBRANE 1 



SEC 
PRD 
MEM 

SEQ 
PRD 
MEM 

SEQ 
PRD 
MEM 

SEQ 
PRD 
MEM 



MENHKSNNKENITIVDISRKINQLPEAERNLLENGSVYVGLNAALCGLIANSLFRRILNV 
cccccccccceeeeeehhhhhhccchhhhhhhccccceeeecchhhhhhhhhhhhhhhhh 

TKARIAAGLPMAGIPFLTTDLTYRCFVSFPLNTGDLDCETCTITRSGLTGLVIGGLYPVF 
hhhhhhhccccccceeeeecccccccccccccccccccccccccccccceeeecccceee 

MMMMMMMMMMMMMM 

LAIPVNGGLAARYQSALLPHKGNILSYWIRTSKPVFRKMLFPILLQTMFSAYLGSEQYKL 
eeeccccccchhhhhhccccccceeeeeeecccchhhhhchhhhhhhhhhhhhcchhhhh 
MMM 

LIKALQLSEPGKEIH 
hhhhhhhcccccccc 



Prosite for DKFZphute 1^24019 .2 



PS00001 


11->15 


ASN GL YCOS YLAT I ON 


PDOC00001 


PS00001 


34->38 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


59->63 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


18->21 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


82->85 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


151->154 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


13->17 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


40->46 


MYRISTYL 


PDOC00008 


PS00008 


47->53 


MYRISTYL 


PDOC00008 


PS00008 


68->74 


MYRISTYL 


PDOC00008 


PS00008 


110->116 


MYRISTYL 


PDOC00008 


PS00008 


127->133 


MYRISTYL 


PDOC00008 


PS00008 


142->148 


MYRISTYL 


PDOC00008 



(No Pfam data available foe DKFZphutel_24cl9.2) 
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DKFZphutel_24ell 



group: intracellular transport and trafficking 

DKFZphutel_24ell encodes a novel 226 amino acid protein, with similarity to human/mouse golgi 
4 -transmembrane spanning transporter MTP. MTP may function in the transport of nucleosides 
and/or nucleoside derivatives between the cytosol and the lumen of an intracellular membrane- 
bound compartment. Thus, the novel protein also seems to be involved in nucleotide sugar 
transport. 

The new protein can find application in modulating the transport of nucleosides and/or 
nucleoside derivatives between the cytosol and the lumen of an intracellular membrane -bound 
compartments . 



Similarity to 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 

complete cDNA, complete cds, EST hits 
potential start at 184, 
TRANSMEMBRANE 4 

function in the transport of nucleosides and/or nucleoside derivatives 
between the cytosol and 

the lumen of an intracellular membrane -bound compartment? 
Sequenced by Qiagen 
Locus: /map 3 '^" 
Insert length: 2005 bp 

Poly A stretch at pos. 1988, polyadenylation signal at pos. 1963 



1 ACGCGTCCGG CAGAAGCTCG GAGCTCTCGG GGTATCGAGG AGGCAGGCCC 

51 GCGGGCGCAC GGGCGAGCGG GCCGGGAGCC GGAGCGGCGG AGGAGCCGGC 

101 AGCAGCGGCG CGGCGGGCTC CAGGCGAGGC GGTCGACGCT CCTGAAAACT 

151 TGCGCGCGCG CTCGCGCCAC TGCGCCCGGA GCGATGAAGA TGGTCGCGCC 

201 CTGGACGCGG TTCTACTCCA ACAGCTGCTG CTTGTGCTGC CATGTCCGCA 

251 CCGGCACCAT CCTGCTCGGC GTCTGGTATC TGATCATCAA TGCTGTGGTA 

301 CTGTTGATTT TATTGAGTGC CCTGGCTGAT CCGGATCAGT ATAACTTTTC 

351 AAGTTCTGAA CTGGGAGGTG ACTTTGAGTT CATGGATGAT GCCAACATGT 

401 GCATTGCCAT TGCGATTTCT CTTCTCATGA TCCTGATATG TGCTATGGCT 

451 ACTTACGGAG CGTACAAGCA ACGCGCAGCC TGGATCATCC CATTCTTCTG 

501 TTACCAGATC TTTGACTTTG CCCTGAACAT GTTGGTTGCA ATCACTGTGC 

551 TTATTTATCC AAACTCCATT CAGGAATACA TACGGCAACT GCCTCCTAAT 

601 TTTCCCTACA GAGATGATGT CATGTCAGTG AATCCTACCT GTTTGGTCCT 

651 TATTATTCTT CTGTTTATTA GCATTATCTT GACTTTTAAG GGTTACTTGA 

701 TTAGCTGTGT TTGGAACTGC TACCGATACA TCAATGGTAG GAACTCCTCT 

751 GATGTCCTGG TTTATGTTAC CAGCAATGAC ACTACGGTGC TGCTACCCCC 

801 GTATGATGAT GCCACTGTGA ATGGTGCTGC CAAGGAGCCA CCGCCACCTT 

851 ACGTGTCTGC CTAAGCCTTC AAGTGGGCGG AGCTGAGGGC AGCAGCTTGA 

901 CTTTGCAGAC ATCTGAGCAA TAGTTCTGTT ATTTCACTTT TGCCATGAGC 

951 CTCTCTGAGC TTGTTTGTTG CTGAAATGCT ACTTTTTAAA ATTTAGATGT 

1001 TAGATTGAAA ACTGTAGTTT TCAACATATG CTTTGCTAGA ACACTGTGAT 

1051 AGATTAACTG TAGAATTCTT CCTGTACGAT TGGGGATATA ACGGGCTTCA 

1101 CTAACCTTCC CTAGGCATTG AAACTTCCCC CAAATCTGAT GGACCTAGAA 

1151 GTCTGCTTTT GTACCTGCTG GGCCCCAAAG TTGGGCATTT TTCTCTCTGT 

1201 TCCCTCTCTT TTGAAAATGT AAAATAAAAC CAAAAATAGA CAACTTTTTC 

1251 TTCAGCCATT CCAGCATAGA GAACAAAACC TTATGGAAAC AGGAATGTCA 

1301 ATTGTGTAAT CATTGTTCTA ATTAGGTAAA TAGAAGTCCT TATGTATGTG. 

1351 TTACAAGAAT TTCCCCCACA ACATCCTTTA TGACTGAAGT TCAATGACAG 

1401 TTTGTGTTTG GTGGTAAAGG ATTTTCTCCA TGGCCTGAAT TAAGACCATT 

1451 AGAAAGCACC AGGCCGTGGG AGCAGTGACC ATCTACTGAC TGTTCTTGTG 

1501 GATCTTGTGT CCAGGGACAT GGGGTGACAT GCCTCGTATG TGTTAGAGGG 

1551 TGGAATGGAT GTGTTTGGCG CTGCATGGGA TCTGGTGCCC CTCTTCTCCT 

1601 GGATTCACAT CCCCACCCAG GGCCCGCTTT TACTAAGTGT TCTGCCCTAG 
1651 ATTGGTTCAA GGAGGTCATC CAACTGACTT TATCAAGTGG AATTGGGATA 

1701 TATTTGATAT ACTTCTGCCT AACAACATGG AAAAGGGTTT TCTTTTCCCT 

1751 GCAAGCTACA TCCTACTGCT TTGAACTTCC AAGTATGTCT AGTCACCTTT 

1801 TAAAATGTAA ACATTTTCAG AAAAATGAGG ATTGCCTTCC TTGTATGCGC 

1851 TTTTTACCTT GACTACCTGA ATTGCAAGGG ATTTTTATAT ATTCATATGT 

1901 TACAAAGTCA GCAACTCTCC TGTTGGTTCA TTATTGAATG TGCTGTAAAT 

1951 TAAGTCGTTT GCAATTAAAA CAAGGTTTGC CCACATCCAA AAAAAAAAAA 
2001 AAAAA 



BLAST Results 



Entry HS012351 from database EMBL: 
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human STS SHGC-31823. 
Score = 1629, P « 3.1e-67, identities = 343/354 



Medline entries 



96199248: 

Identification of a novel membrane transporter 
associated with intracellular membranes by 
phenotypic complementation in the yeast 
Saccharomyces cerevisiae. 



Peptide information for frame 1 



ORF from 184 bp to 861 bp; peptide length: 226 
Category: strong similarity to known protein 



1 MKMVAPWTRF YSNSCCLCCH VRTGTILLGV WYLIINAVVL LILLSALADP 
51 DQYNFSSSEL GGDFEFMDDA NMCIAIAISL LMILICAMAT YGAYKQRAAW 
101 IIPFFCYQIF DFALNMLVAI TVLIYPNSIQ EYIRQLPPNF PYRDDVMSVN 
151 PTCLVLIILL FISIILTFKG YLISCVWNCY RYINGRNSSD VLVYVTSNDT 
201 TVLLPPYDDA TVNGAAKEPP PPYVSA 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_24ell, frame 1 

SWISSPROT:MTRP_HUMAN GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 
(KIAA0108)., N = 1, Score = 551, P = 2.9e-53 

SWISSPROTrMTRP MOUSE GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP., N 
- 1, Score = 539, P - 5.3e-52 

TREMBL:HS304 981_1 product: **E3 protein"; Human retinoic acid-inducible 
E3 protein mRNA, complete cds., N ■ 1, Score =• 127, p = 3.4e-06 



>SWISSPROT:MTRP_HUMAN GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 
(KIAA0108) . 

Length - 233 

HSPs: 



Score 


= 551 


(82.7 bits), Expect = 2.9e-53, P = 2.9e-53 




Identities = 


= 102/221 (46%), Positives = 148/221 (66%) 




Query: 


9 


RFYSNSCCLCCHVRTGTILLGVWYLIINAVVLLILLSALADPDQY NFSSSELGGDF- 


64 






RFYS CC CCHVRTGTI +LG WY+++N ++ ++L + P+ N +G + 




Sbjct : 


13 


RFYSTRCCGCCHVRTGTIILGTWYMVVNLLMAILLTVEVTHPNSMPAVNIQYEVIGNYYS 


72 


Query: 


65 


-EFMDDANMCIAIAISLLMILICAMATYGAYKQRAAWIIPFFCYQIFDFALNMLVAITVL 


123 






E M D N C+ A+S+LM +1 +M YGA + W+IPFFCY++FDF L+ LVAI+ L 




Sbjct: 


73 


SERMAD-NACVLFAVSVLMFIISSMLVYGAISYQVGWLIPFFCYRLFDFVLSCLVAISSL 


131 


Query: 


124 


IYPNSIQEYIRQLPPNFPYRDDVMSVNPTCLVLIILLFISIILTFKGYLISCVWNCYRYI 


183 






Y I+EY+ QLP +FPY+DD+++++ +CL+ I+L+F ++ + FK YLI+CVWNCY+YI 




Sbjct: 


132 


TYLPRIKEYLDQLP-DFPYKDDLLALDSSCLLFIVLVFFALFIIFKAYLINCVWNCYKYI 


190 


Query: 


184 


NGRNSSDVLVYVTSN-DTTVLLPPYDDATVNGAAKEPPPPYVSA 22 6 








N RN ++ VY +LP Y+ A V KEPPPPY+ A 




Sbjct: 


191 


NNRNVPEIAVYPAFEAPPQYVLPTYEMA-VKMPEKEPPPPYLPA 233 





Pedant information for DKFZphutel_24ell, frame 1 



Report for DKFZphutel_24ell . 1 



[LENGTH] 226 

[MW] 25419.11 
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[pi] 


4.65 




[ HOMOL ] 


SWISS PROT :MTRP_HUMAN 


GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP (KIAA01O8) 


5e-40 






[PROSITE] 


CK2 PHOSPHO SITE 


3 


[PROSITE] 


TYR PHOSPHO SITE 


1 


[PROSITE] 


PKC PHOSPHO SITE 


1 


[ PROSITE] 


ASN"~GL YCOS YLAT I ON 


3 


[KW] 


SIGNAL PEPTIDE 49 




[KW] 


TRANSMEMBRANE 2 




[KW] 


LOW COMPLEXITY 20. 


.80 % 



SEQ MKMVAPWTRFYSNSCCLCCHVRTGTILLGVWYLIINAVVLLILLSALADPDQYNFSSSEL 

SEG xxxxxxxxxxxxxxxx 

PRD ccceeeeeeecccceeeeeeeeccceeecceeehhhhhhhhhhhhhhcccccceeecccc 

MEM 

SEQ GGDFEFMDDANMCIAIAISLLMILICAMATYGAYKQRAAWIIPFFCYQIFDFALNMLVAI 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhh 

MEM ••••••••• MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ TVLIYPNSIQEYIRQLPPNFPYRDDVMSVNPTCLVLIILLFISIILTFKGYLISCVWNCY 

SEG xxxxxxxxxxxxx 

PRD hhhcccchhhhhhhhcccccccccceeeeccccceeehhhhhhhhhhhhhheeeeeeeee 

MEM MMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ RYINGRNSSDVLVYVTSNDTTVLLPPYDDATVNGAAKEPPPPYVSA 

SEG 

PRD eecccccccceeeeeecccccccccccccccccccccccccccccc 

MEM 



Prosite for DKFZphutel_24ell . 1 



PS00001 
PS00001 
PS00001 
PS00005 
PS00006 
PS00006 
PS00006 
PS00007 



54->58 
187->191 
198->202 
167->170 

56->60 
128->132 
196->200 
186->195 



ASN_GL YCOS YLAT I ON 
ASN_GLYCOSYLATION 
AS N_GL YCOS YLAT I ON 
PKC_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
TYR PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 



(No Pfam data available for DKFZphutel_24ell . 1) 
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DKFZphutel_24j6 



group: cell structure and motility 

DKFZphutesl_24j6 encodes a novel 571 amino acid protein with strong similarity to rat cell 
adhesion regulator {CARD. 

The novel protein is very similar to Carl and thus seems to be involved in regulation cell- 
cell adhesion. It contains a RGD cell attachment site. 

The new protein can find application in modulation of cell-cell-adhesion. 

strong similarity to rat CAR1 A.thaliana T19C21.5 

complete cDNA, complete cds, EST hits 

potential frame shift at Bp 1241 according to CAR1 

but frame shift might be in CARl sequence! 

ESTs T73366 AA362984 confirm this sequence 

Sequenced by Qiagen 

Locus: /map= n 939.9 cR from top of Chr2 linkage group" 
Insert length: 3333 bp 

Poly A stretch at pos. 3316, no polyadenylation signal found 

1 ACGCGTCCGA GCTGGCTCAG GGCGTCCGCT AGGCTCGGAC GACCTGCTGA 

51 GCCTCCCAAA CCGCTTCCAT AAGGCTTTGC CTTTCCAACT TCAGCTACAG 

101 TGTTAGCTAA GTTTGGAAAG AAGGAAAAAA GAAAATCCCT GGGCCCCTTT 

151 TCTTTTGTTC TTTGCCAAAG TCGTCGTTGT AGTCTTTTTG CCCAAGGCTG 

201 TTGTGTTTTT AGAGGTGCTA TCTCCAGTTC CTTGCACTCC TGTTAACAAG 

251 CACCTCAGCG AGAGCAGCAG CAGCGATAGC AGCCGCAGAA GAGCCAGCGG 

301 GGTCGCCTAG TGTCATGACC AGGGCGGGAG ATCACAACCG CCAGAGAGGA 

351 TGCTGTGGAT CCTTGGCCGA CTACCTGACC TCTGCAAAAT TCCTTCTCTA 

401 CCTTGGTCAT TCTCTCTCTA CTTGGGGAGA TCGGATGTGG CACTTTGCGG 

451 TGTCTGTGTT TCTGGTAGAG CTCTATGGAA ACAGCCTCCT TTTGACAGCA 

501 GTCTACGGGC TGGTGGTGGC AGGGTCTGTT CTGGTCCTGG GAGCCATCAT 

551 CGGTGACTGG GTGGACAAGA ATGCTAGACT TAAAGTGGCC CAGACCTCGC 

601 TGGTGGTACA GAATGTTTCA GTCATCCTGT GTGQAATCAT CCTGATGATG 

651 GTTTTCTTAC ATAAACATGA GCTTCTGACC ATGTACCATG GATGGGTTCT 

701 CACTTCCTGC TATATCCTGA TCATCACTAT TGCAAATATT GCAAATTTGG 

751 CCAGTACTGC TACTGCAATC ACAATCCAAA GGGATTGGAT TGTTGTTGTT 

801 GCAGGAGAAG ACAGAAGCAA ACTAGCAAAT ATGAATGCCA CAATACGAAG 

851 GATTGACCAG TTAACCAACA TCTTAGCCCC CATGGCTGTT GGCCAGATTA 

901 TGACATTTGG CTCCCCAGTC ATCGGCTGTG GCTTTATTTC GGGATGGAAC 

951 TTGGTATCCA TGTGCGTGGA GTACGTCCTG CTCTGGAAGG TTTACCAGAA 

1001 AACCCCAGCT CTAGCTGTGA AAGCTGGTCT TAAAGAAGAG GAAACTGAAT 

1051 TGAAACAGCT GAATTTACAC AAAGATACTG AGCCAAAACC CCTGGAGGGA 

1101 ACTCATCTAA TGGGTGTGAA AGACTCTAAC ATCCATGAGC TTGAACATGA 

1151 GCAAGAGCCT ACTTGTGCCT CCCAGATGGC TGAGCCCTTC CGTACCTTCC 

1201 GAGATGGATG GGTCTCCTAC TACAACCAGC CTGTGTTTCT GGCTGGCATG 

1251 GGTCTTGCTT TCCTTTATAT GACTGTCCTG GGCTTTGACT GCATCACCAC 

1301 AGGGTACGCC TACACTCAGG GACTGAGTGG TTCCATCCTC AGTATTTTGA 

1351 TGGGAGCATC AGCTATAACT GGAATAATGG GAACTGTAGC TTTTACTTGG 

1401 CTACGTCGAA AATGTGGTTT GGTTCGGACA GGTCTGATCT CAGGATTGGC 

14 51 ACAGCTTTCC TGTTTGATCT TGTGTGTGAT CTCTGTATTC ATGCCTGGAA 

1501 GCCCCCTGGA CTTGTCCGTT TCTCCTTTTG AAGATATCCG ATCAAGGTTC 

1551 ATTCAAGGAG AGTCAATTAC ACCTACCAAG ATACCTGAAA TTACAACTGA 

1601 AATATACATG TCTAATGGGT CTAATTCTGC TAATATTGTC CCGGAGACAA 

1651 GTCCTGAATC TGTGCCCATA ATCTCTGTCA GTCTGCTGTT TGCAGGCGTC 

1701 ATTGCTGCTA GAATCGGTCT TTGGTCCTTT GATTTAACTG TGACACAGTT 

1751 GCTGCAAGAA AATGTAATTG AATCTGAAAG AGGCATTATA AATGGTGTAC 

1801 AGAACTCCAT GAACTATCTT CTTGATCTTC TGCATTTCAT CATGGTCATC 

1851 CTGGCTCCAA ATCCTGAAGC TTTTGGCTTG CTCGTATTGA TTTCAGTCTC 

1901 CTTTGTGGCA ATGGGCCACA TTATGTATTT CCGATTTGCC CAAAATACTC 

1951 TGGGAAACAA GCTCTTTGCT TGCGGTCCTG ATGCAAAAGA AGTTAGGAAG 

2001 GAAAATCAAG CAAATACATC TGTTGTTTGA GACAGTTTAA CTGTTGCTAT 

2051 CCTGTTACTA GATTATATAG AGCACATGTG CTTATTTTGT ACTGCAGAAT 

2101 TCCAATAAAT GGCTGGGTGT TTTGCTCTGT TTTTACCACA GCTGTGCCTT 

2151 GAGAACTAAA AGCTGTTTAG GAAACCTAAG TCAGCAGAAA TTAACTGATT 

2201 AATTTCCCTT ATGTTGAGGC ATGGAAAAAA AATTGGAAAA GAAAAACTCA 

2251 GTTTAAATAC GGAGACTATA ATGATAACAC TGAATTCCCC TATTTCTCAT 

2301 GAGTAGATAC AATCTTACGT AAAAGAGTGG TTAGTCACGT GAATTCAGTT 

2351 ATCATTTGAC AGATTCTTAT CTGTACTAGA ATTCAGATAT GTCAGTTTTC 

2401 TGCAAAACTC ACTCTTGTTC AAGACTAGCT AATTTATTTT TTTGCATCTT 

2451 AGTTATTTTT AAAAACAAAT TCTTCAAGTA TGAAGACTAA ATTTTGATAA 

2501 CTAATATTAT CCTTATTGAT CCTATTGATC TTAAGGTATT TACATGTATG 
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2551 TGGAAAAACA AAACACTTAA 
2601 GCTTAAAGAG CACCTTTGTA 
2651 ATGAAGCATA TGTAGCACTT 
2701 AGAAGCAAAG CTGTAAAGTA 
2751 TCAAATATGT CAATAGTTTG 
2801 AGAAGGGCAA GAATCCCAAT 
2851 TGTTGTAGAA CATGAGGGTG 
2901 AAAGCCCACA CTTGTGAAGG 
2951 ACTCAGGTAG AATATTTTTA 
3001 CTACATTGTT CTACAGCAAG 
3051 CTTTGAGAAG AATAGAAGAA 
3101 TTTAAAAGTC AGTTTGCAAC 
3151 ACCGTTTATA TGCACTTTCA 
3201 TTCTTTATCC TTGGAGTTTA 
3251 ATAATGATTT GCTATGTTGT 
3301 ATATTTTGAA AATCTTAAAA 



CTAGAATTCT CTAATAAGGT TTATGGTTTA 
TTTTTATTAT CAGATGGGGC AACATATTGT 
CACAGCATGG TTATCATGTA AGCTGCAGGT 
GATTTATCAC ACAATGACTG CATACAGACT 
GTCATAGAAC CTAGAAGCCA AAAGCCACAC 
TTAACTCATG TTATCATCAT TAGTGATCTG 
TAAGCCTTCA GCCTGGCAAG TTACATGTAG 
TTTTGTTTTA CAAATCACTT GATTTAACAC 
TTTTTACTGT TTTATACCCA GAAGTTATTT 
AATATTCATA AAAGTATCCC TTTCAAATGC 
AAAAAGTTTG TATATATTTT AAAAAATTGT 
ATGTCTGTAC CAAGATGGTA CTTTGCCTTA 
TGGAGACTGC AATACGTTGC TATGAGCACT 
ATCCTTTGCT TCATCTTTCT ACAGTATGAC 
AAAATCTTTG TAAAAAATTT CTATATAAAA 
AAAAAAAAAA AAA 



BLAST Results 



Entry HS389210 from database EMBL: 
human STS SHGC-10164. 
Score = 1592, P = 1.5e-64, identities - 346/364 

Entry HS933343 from database EMBL: 
human STS WI-16551. 
Score = 1193, P » 5.7e-46, identities = 241/244 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 315 bp to 2027 bp; peptide length: 571 
Category: strong similarity to known protein 



1 MTRAGDHNRQ RGCCGSLADY 
51 VELYGNSLLL TAVYGLWAG 
101 VSVILCGIIL MMVFLHKHEL 
151 AITIQRDWIV VVAGEDRSKL 
201 PVIGCGFISG WNLVSMCVEY 
251 LHKDTEPKPL EGTHLMGVKD 
301 SYYNQPVFLA GMGLAFLYMT 
351 ITGIMGTVAF TWLRRKCGLV 
401 SVSPFEDIRS RFIQGESITP 
451 PI1SVSLLFA GVIAARIGLW 
501 YLLDLLHFIM VILAPNPEAF 
551 FACGPDAKEV RKENQANTSV 



LTSAKFLLYL GHSLSTWGDR MWHFAVSVFL 
SVLVLGAIIG DWVDKNARLK VAQTSLVVQN 
LTMYHGWVLT SCYILIITIA N I AN LAS TAT 
ANMNATIRRI DQLTNILAPM AVGQIMTFGS 
VLLWKVYQKT PALAVKAGLK EEETELKQLN 
SNIHELEHEQ EPTCASQMAE PFRTFRDGWV 
VLGFDCITTG YAYTQGLSGS ILSILMGASA 
RTGLISGLAQ LSCLILCVIS VFMPGSPLDL 
TKIPEITTEI YMSNGSNSAN IVPETSPESV 
SFDLTVTQLL QENVIESERG IINGVQNSMN 
GLLVLISVSF VAMGHIMYFR FAQNTLGNKL 
V 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel_24 j 6, frame 3 

TREMBLNEW:U76714_1 gene: "CAR1"; product: "cell adhesion regulator"; 
Rattus norvegicus cell adhesion regulator (CARD mRNA, complete cds . , N 
=» 1, Score = 1472, P = 7.2e-151 

TREMBL:AC004683_5 gene: "T19C21.5"; Arabidopsis thaliana chromosome II 
BAC T19C21 genomic sequence, complete sequence., N » 2, Score » 437, P 
= 2.8e-60 



TREMBL:AF03904 6_2 gene: "R09B5.4"; Caenorhabditis elegans cosmid 
R09B5., N « 2, Score = 323, P =* 1.5e-43 



>TREMBLNEW:U76714_1 gene: "CARl"; product: "cell adhesion regulator"; 

Rattus norvegicus cell adhesion regulator (CARl) mRNA, complete cds. 
Length =» 405 
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HSPs: 



Score - 1472 (220.9 bits), Expect = 7.2e-151, P = 7.2e-151 
Identities = 288/319 (90%), Positives - 297/319 (93%) 



Query: 


1 


MTRAGDHNRQRGCCGSLADYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 


60 






MT++ D Q GCCGSLA+YLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 




Sbjct : 


1 


MTKSRDQTHQEGCCGSLANYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 


60 


Query: 


61 


TAVYGLVVAGSVLVLGAIIGDWVDKNARLKVAQTSLWQNVSVILCGIILMMVFLHKHEL 


120 






TAVYGLWAGSVLVLGAIIGDWVDKNARLKVAQTSLVVQNVSVILCGIILMMVFLHK+EL 




Sbjct : 


61 


TAVYGLWAGSVLVLGAIIGDWVDKNARLKVAQTSLVVQNVSVILCGIILMMVFLHKNEL 


120 


Query: 


121 


LTMYHGWVLTSCYILIITIANIANLASTATAITIQRDWIVVVAGEDRSKLANMNATIRRI 


180 






L MYHGWVLT CYILIITIANI ANL AS T AT A I T I QRDW I VVV AGE + RS + L A+ MN AT I RRI 




Sbjct: 


121 


LNMYHGWVLTVCYILIITIANIANLASTATAITIQRDWIVVVAGENRSRLADMNATIRRI 


180 


Query: 


181 


DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEYVLLWKVYQKTPALAVKAGLK 


240 






DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEY LLWKVYQKT PALAVKA LK 




Sbjct: 


181 


DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEYFLLWKVYQKTPALAVKAALK 


240 


Query: 


241 


EEETELKQLNLHKDTEPKPLEGTHLMGVKDSNIHELEHEQEPTCASQMAEPFRTFRDGWV 


300 






EE+ELKQL KDTEPKPLEGTHLMG KDSNI ELE EQEPTCASQ+AEPFRTFRDGWV 




Sbjct: 


241 


VEESELKQLTSPKDTEPKPLEGTHLMGEKDSNIRELECEQEPTCASQIAEPFRTFRDGWV 


300 


Query: 


301 


SYYNQPVFLAGMGLAF-LY 318 








SYYNQPVFL G F LY 




Sbjct: 


301 


SYYNQPVFLGWHGPGFPLY 319 





Pedant information for DKFZphutel_24 j6, frame 3 



Report for DKFZphutel_24 j6. 3 



( LENGTH] 571 

[MW] 62542.72 

[plj 6.08 

[HOMOL] TREMBL:U7 6714_1 gene: "CARl"; product: "cell adhesion regulator"; Rattus 

norvegicus cell adhesion regulator (CARD mRNA, complete cds. le-141 

[BLOCKS] BL00341D 

[PROSITEJ MYRISTYL 15 

[PROSITE] MITOCH_CARRIER 1 

[PROSITE) CK2 PHOSPHO_SITE 6 

[PROSITE) PROKAR_LIPOPROTEIN 1 

[PROSITE] PKC_PHOSPHO SITE 4 

[PROSITE] ASN_GLYCOSYLATION 4 

[PFAM] Laminin B (Domain IV) 

[KW] TRANSMEMBRANE 4 

(KWJ LOW_COMPLEXITY 8.76 % 



SEQ MTRAGDHNRQRGCCGSLADYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL ' 

SEG 

PRD ccccccccccccccccchhhhhhhheeeeccceeecccchhhhhhhhheeeeecccccee 

MEM MMMMMMMMMMMMM 

SEQ TAVYGLVVAGSVLVLGAI IGDWVDKNARLKVAQTSLVVQNVSVI LCGI ILMMVFLHKHEL 

SEG . xxxxxxxxxxxxxxxx 

PRD ehhhhhhhccceeeeccccccchhhhhhhhhhhhheeeccchhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ LTMYHGWVLTSCYILIITIANIANLASTATAITIQRDWIVVVAGEDRSKLANMNATIRRI 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD hhcccccchhhhhhhhhhhhhhhhhhhhhheeeeccceeeeeeccccchhhhhhhhhhhh 

MEM MMMMMMM 

SEQ DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEYVLLWKVYQKTPALAVKAGLK 

SEG 

PRD hhhhhhccceeeceeeeeecceeeeeeeeccchhhhhhhhhhhhhhhcccchhhhhhhhh 

MEM 

SEQ EEETELKQLNLHKDTEPKPLEGTHLMGVKDSNIHELEHEQEPTCASQMAEPFRTFRDGWV 

SEG 

PRD hhhhhhhhhhccccccccccceeeeeecccccccccccccccccccccccccccccccee 

MEM 

SEQ SYYNQPVFLAGMGLAFLYMTVLGFDCITTGYAYTQGLSGSILSILMGASAITGIMGTVAF 

SEG 

PRD eeecceeeecccchhhhhhcccccceeeeeeeeccccceeeeeeecccceeeeehhhhhh 
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MEM 



SEQ TWLRRKCGLVRTGLISGLAQLSCLILCVISVFMPGSPLDLSVSPFEDIRSRFIQGESITP 

SEG xxx 

pro hhhhhhccccccccchhhhhhhhhhhhhhhhcccccccccccccchhhhhhccccccccc 

MEM 

SEQ TKIPEITTEIYMSNGSNSANIVPETSPESVPIISVSLLFAGVIAARIGLWSFDLTVTQLL 

SEG xxxxxxxxxx 

PRD ccccccceeeeecccccccccccccccccceeeeeehhhhhhhhhhcccchhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMM 

SEQ QENVIESERGI INGVQNSMNYLLDLLHFIMVILAPNPEAFGLLVLISVSFVAMGHIMYFR 

SEG 

PRD hhhhhccccceeeecccchhhhhhhhhhheeeeeccccccceeeeeeeeccccccceeee 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMM . . . 

SEQ FAQNTLGNKLFACGPDAKEVRKENQANTSVV 

SEG 

PRD eecccccceeeeccccchhhhhhhhcccccc 

MEM 



Prosite for DKFZphutel_24 j 6. 3 



PS00001 100-M04 ASN_GL YCOS YLAT I ON PDOC00001 

PS00001 174->178 ASN_GL YCOS YLAT I ON PDOC00001 

PS00001 434->438 ASN_GL YCOS Y LAT I ON PDOC00001 

PS00001 567->571 ASN_GLYCOSYLATION PDOC00001 

PS00005 23->2 6 PKC_PHOSPHO_SITE PDOC00005 

PS00005 176->179 PKC_PHOSPHO_SITE PDOC00005 

PS00005 294->297 PKC_PHOSPHO_SITE PDOC00005 

PS00005 487->490 PKC_PHOSPHO_SITE PDOC00005 

PS00006 16->20 CK2_PHOSPHO SITE PDOC00006 

PS00006 36->40 CK2_PHOSPHO~SITE PDOC00006 

PS00006 294->298 CK2_PHOSPHO_SITE PDOC00006 

PS00006 396->400 CK2_PHOSPHO_SITE PDOC00006 

PS00006 403->407 CK2_PHOSPHO_SITE PDOC00006 

PS00006 445->449 CK2_PHOSPHO_SITE PDOC00006 

PS00008 12->18 MYRISTYL PDOC00008 

PS00008 65->71 MYRISTYL PDOC00008 

PS00008 76->82 MYRISTYL PDOC00008 

PS00008 193->199 MYRISTYL PDOC00008 

PS00008 267->273 MYRISTYL PDOC00008 

PS00008 311->317 MYRISTYL PDOC00008 

PS00008 336->342 MYRISTYL PDOC00008 

PS00008 339->345 MYRISTYL PDOC00008 

PS00008 353->359 MYRISTYL PDOC00008 

PS00008 368->374 MYRISTYL PDOC00008 

PS00008 373->379 MYRISTYL PDOC00008 

PS00008 435->441 MYRISTYL PDOC00008 

PS00008 461->467 MYRISTYL PDOC00008 

PS00008 490->496 MYRISTYL PDOC00008 

PS00008 494->500 MYRISTYL PDOC00008 

PS00013 122->133 PROKAR_LIPOPROTEIN PDOC00013 

PS00215 404->414 MITOCH CARRIER PDOC00189 



Pfam for DKFZphutel_24j6. 3 



HMM_NAME Laminin B (Domain IV) 

HMM * YWR1 PERFLGDQvTs YGGkLe * 

Y+R + LG+++ + G + + 
Query 538 YFRFAQNTLGNKLFACGPDAK 558 
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DKFZphutel_2h3 



group: differentiation /development 

DKFZphutel_2h3 encodes a novel 267 amino acid protein, with similarity to ITM2 (integral 
membrane protein 2) of chicken and mouse. 

The novel protein contains a prenyl group binding site (CAAX box) and seems to be post- 
translationally modified by the attachment of either a farnesyl or a geranyl-geranyl group. 
The similar gallus G. protein E25 a marker for chondro-osteogenic differentiation. 

The new protein can find application as a useful marker for chondro-osteogenic cell 
differentiation and for the modulation of chondro-osteogenic cell differentiation. 



strong similarity to mouse E25 and gallus E3-16 
complete cDNA, EST hits 

complete cds according to E25 start at Bp 56 
putative transmembrane protein (1 TM) 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2033 bp 

Poly A stretch at pos. 2007, polyadenylation signal at pos. 1986 



1 GGACCGAGGC TGCACCGGCA 

51 CAGCCATGGT GAAGATTAGC 

101 GACAAGGCTG ACAAGGCGTC 

151 CGAGATCCTG CTGACGCCGG 

201 CCAAGAGGGG GAGCTCAGTG 

251 GTCGTGCTGC TCATGGGCCT 

301 CTTCTTTCTT GCACAGCTGG 

351 TGTATGAGGA CTCCCTGTCC 

401 GAGGATGTGA AAATCTACCT 

451 TGTGCCCCAG TTTGGCGGCG 

501 AGCGGGGTCT GACTGCGTAC 

551 ATCGAACTCA ACACCACCAT 

601 CCTCATGAAC GTGAAGAGGG 

651 AGGAGGAGAT GGTGGTCACG 

701 TCCTTCATCT ACCACCTGTG 

751 CCGGGCAACG CGGAGGCGGA 

801 CCATCCGCCA CTTCGAGAAC 

851 GTGGTGTGAG GCCCTCCTCC 

901 TTCTTTCCAG CTGCTCTCTG 

951 CTTTGGACGC GTTTCTATAG 

1001 CCTGCCCACC TCCCTGTACC 

1051 CTCTGCTGAC CTGGGTGTGG 

1101 TCTGTGTCCC ACTGTCTTGA 

1151 CTGCACCGGC AGCCCAAGGG 

1201 AGGCCCTGGG CAAGGGGATG 

1251 AGAAGTATCT GCACAATTAG 

1301 TACACTTTCT TCACTGTCCC 

1351 TGGGACGATG TGCCCAGGGA 

1401 TACCTGGGGG TGTCCCAGGG 

1451 GAGCTTGGAG TTTGGGGAGT 

1501 CTGAGTGGAA CCAAAGAAGC 

1551 AGGAGCACAA GCAGGGTCCC 

1601 GGAACGGGGC AGGCAAGGTC 

1651 CGTGGGTTCT GCTGAGTAGG 

1701 CTGTTTTGAA AGATAACACA 

1751 CCACCCTGCC TCCTCTGTTC 

1801 TGCACCTTTT TCCCTTTCCT 

1851 CGATATGCTA ACCGTTCTCA 

1901 GCCTTCAGTC AGTCTCTGGG 

1951 CTCAATTAGA TCTCTTTTCA 

2001 ACTTCTGAAA AAAAAAAAAA 



GAGGCTGCGG GGCGGACGCG CGGGCCGGCG 
TTCCAGCCCG CCGTGGCTGG CATCAAGGGC 
GGCGTCGGCC CCTGCGCCGG CCTCGGCCAC 
CTAGGGAGGA GCAGCCCCCA CAACATCGAT 
GGCGGCGTGT GCTACCTGTC GATGGGCATG 
CGTGTTCGCC TCTGTCTACA TCTACAGATA 
CCCGAGATAA CTTCTTCCGC TGTGGTGTGC 
TCCCAGGTCC GGACTCAGAT GGAGCTGGAA 
CGACGAGAAC TACGAGCGCA TCAACGTGCC 
GTGACCCTGC AGACATCATC CATGACTTCC 
CATGATATCT CCCTGGACAA GTGCTATGTC 
TGTGCTGCCC CCTCGCAACT TCTGGGAGCT 
GGACCTACCT GCCGCAGACG TACATCATCC 
GAGCATGTCA GTGACAAGGA GGCCCTGGGG 
CAACGGGAAA GACACCTACC GGCTCCGGCG 
TCAACAAGCG TGGGGCCAAG AACTGCAATG 
ACCTTCGTGG TGGAGACGCT CATCTGCGGG 
CCCAGAACCC CCTGCCGTGT TCCTCTTTTC 
GCCCTCCTCC TTCCCCCTGC TTAGCTTGTA 
AGGTGACATG TCTCTCCATT CCTCTCCAAC 
AGAGCTGTGA TCTCTCGGTG GGGGGCCCAT 
CGGAGGGAGA GGCGATGCTG CAAAGTGTTT 
AGCTGGGCCT GCCAAAGCCT GGGCCCACAG 
GAAGGACCGG TTGGGGGAGC CGGGCATGTG 
GGGCTGTGGG GGCGGGGCGG CATGGGCTTC 
AAAAGTCCTC AGAAGCTTTT TCTTGGAGGG 
TATTCCTAGA CCTGGGGCTT GAGCTGAGGA 
GGGACCCACC AGAGCACAAG AGAAGGTGGC 
ACTCTGTCAG TGCCTTCAGC CCACCAGCAG 
GGGGATGAGT CCGTCAAGCA CAACTGTTCT 
AAGGAGCTAG GACCCCCAGT CCTGCCCCCC 
CTCAGTCAAG GCAGTGGGAT GGGCGGCTGA 
ACTGCTCAGT CACGTCCACG GGGGACGAGC 
TGGAGCTCAT TGCTTTCTCC AAGCTTGGAA 
GAGGGAAAGG GAGAGCCACC TGGTACTTGT 
TGAAATTCCA TCCCCCTCAG CTTAGGGGAA 
TCTCACTTTT GCATGTTTTT ACTGATCATT 
GCCCTGAGCC TTGGAGAGGA GGGCTGTAAC 
GATGAAACTC TTAAATGCTT TGTATATTTT 
GAAGTGTCTA TAGAACAATA AAAATCTTTT 
AAAAGGGCGG CCG 



BLAST Results 



Entry B64417 from database EMBL: 

CIT-HSP-2023A7.TR CIT-HSP Homo sapiens genomic clone 2023A7. 
Length =715 
Plus Strand HSPs: 
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Score = 1546 (232.0 bits), Expect = 7.8e-64, P « 7.8e-64 
Identities = 310/311 (99%) 



Medline entries 



96325063: 

Isolation of markers for chondro-osteogenic differentiation using cDNA 
library subtraction. 

Molecular cloning and characterization of a gene belonging to a novel 
multigene family of 

integral membrane proteins. 



Peptide information for frame 2 



ORF from 56 bp to 856 bp; peptide length: 267 
Category: strong similarity to known protein 



1 MVKISFQPAV AGIKGDKADK ASASAPAPAS ATEILLTPAR EEQPPQHRSK 
51 RGSSVGGVCY LSMGMVVLLM GLVFASVYIY RYFFLAQLAR DNFFRCGVLY 
101 EDSLSSQVRT QMELEEDVKI YLDENYERIN VPVPQFGGGD PADIIHDFQR 
151 GLTAYHDISL DKCYVIELNT TIVLPPRNFW ELLMNVKRGT YLPQTYIIQE 
201 EMVVTEHVSD KEALGSFIYH LCNGKDTYRL RRRATRRRIN KRGAKNCNAI 
251 RHFENTFWE TLICGVV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_2h3, frame 2 

SWISSNEW:ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-16)., N = 1 , Score = 573, P = 1.3e-55 

SWISSNEW:ITMB_MOUSE INTEGRAL MEMBRANE PROTEIN 2B (E25B PROTEIN) . , N = 
1, Score = 560, P = 3.2e-54 

SWISSNEW:ITMA_HUMAN INTEGRAL MEMBRANE PROTEIN 2A (E25 PROTEIN) . , N = 1, 
Score = 456, P = 3.3e-43 



>SWISSNEW:ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-16) . 

Length = 262 

HSPs: 



Score « 573 (86.0 bits), Expect = 1.3e-55, P = 1.3e-55 
Identities - 117/264 (44%), Positives - 172/264 (65%) 



Query: 


1 


MVKISFQPAVAGIKGDKADKASASAPAPASATEILLTPAREEQPPQHRSKRGSSVGGVCY 


60 






MVK+SF A+A + A+K ++ ++L+ P ++P G 




Sbjct: 


1 


MVKVSFNSALA— HKEAANKEEENS Q VL ILPPDAKEPE D WV P AGH KRAWCWC 


51 


Query: 


61 


LSMGMVVLLMGLVFASVYI YRYFFLAQLARDNFFRCGVLY-EDSLS SQVRTQM-- 


112 






+ G+ +L G++ Y+Y+YF Q + CG+ Y ED LS +Q+++ 




Sbjct: 


52 


MCFGLAFMLAGVILGGAYLYKYFAFQQ GGVYFCGIKYIEDGLSLPESGAQLKSARYH 


108 


Query: 


113 


ELEEDVKIYLDENYERINVPVPQFGGGDPADIIHDFQRGLTAYHDISLDKCYVIELNTTI 


172 






+E++++I +E+ E I+VPVP+F DPADI+HDF R LTAY D+SLDKCYVI LNT++ 




Sbjct: 


109 


TIEQNIQILEEEDVEFISVPVPEFADSDPADIVHDFHRRLTAYLDLSLDKCYVIPLNTSV 


168 


Query: 


173 


VLPPRNFWELLMNVKRGTYLPQTYIIQEEMVVTEHVSDKEALGSFIYHLCNGKDTYRLRR 


232 






V+PP+NF ELL+N+K GTYLPQ+Y+I E+M+VT+ + + + LG FIY LC GK+TY+L+R 




Sbjct: 


169 


VMPPKNFLELLINIKAGTYLPQSYLIHEQMIVTDRIENVDQLGFFIYRLCRGKETYKLQR 228 


Query: 


233 


RATRRRINKRGAKNCNAIRHFENTFVVETLIC 264 








+ + I KR A NC IRHFEN F +ETLIC 




Sbjct: 


229 


KEAMKGIQKREAVNCRKIRHFENRFAMETLIC 260 








Pedant information for DKFZphutel_2h3, frame 2 
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Report for DKFZphutel_2h3 . 2 



(LENGTH) 267 

tMWJ 30253.96 

tpU 8.16 

(HOMOL1 SWISSNEW:ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B {TRANSMEMBRANE PROTEIN E3-16) . 
le-49 

IPROSITE] MYRISTYL 4 

[PROSITE] PRENYLATION 1 

[PROSITE) CAMP_PHOSPHO_SlTE 3 

[PROSITE] CK2_PHOSPHO SITE 3 

[ PROSITE) TYR_PHOSPHO~SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 4 

[PROSITE] ASN_GLYCOS YLAT I ON 1 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 15.36 % 



SEQ MVKISFQPAVAGIKGDKADKASASAPAPASATEILLTPAREEQPPQHRSKRGSSVGGVCY 

SEG xxxxxxxxxxxxxxxx 

PRD ccccccccchhhhhhhhhhhhhhhhhccccccceeecccccccccccccccccccccchh 

MEM MMMM 

SEQ LSMGMVVLLMGLVFASVYIYRYFFLAQLARDNFFRCGVLYEDSLSSQVRTQMELEEDVKI 

SEG . .xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhcchhhhhhhhhhccceeeeeecccccccchhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMM .*•••>**••*••«.*•*■••••••■*•>*.»•*•• 

SEQ YLDENYERINVPVPQFGGGDPADIIHDFQRGLTAYHDISLDKCYVIELNTTIVLPPRNFW 

SEG 

PRD hhcccceeeeccccccccccccchhhhhhhhhhhhhhhcccceeeeeccceeecccchhh 

MEM 

SEQ ELLMNVKRGTYLPQTYIIQEEMVVTEHVSDKEALGSFIYHLCNGKDTYRLRRRATRRRIN 

SEG xxxxxxxxxxxx 

PRD hhhhhhcccccccceeeeehhhhhhhccccchhhhhheeeccccchhhhhhhhhhhhhhh 

MEM 

SEQ KRGAKNCNAI RHFENTFVVETLICGVV 

SEG xx 

PRD hhhhccceeeecccchhhhhheeeccc 

MEM 



Prosite for DKFZphutel_2h3 . 2 



PS00001 


169->173 


PS00004 


50->54 


PS00004 


187->191 


PS00004 


232->236 


PS00005 


49->52 


PS00005 


209->212 


PS00005 


227->230 


PS00005 


235->238 


PS00006 


30->34 


PS00006 


110->114 


PS00006 


209->213 


PS00007 


119->127 


PS00008 


52->58 


PS00008 


71->77 


PS00008 


138->144 


PS00008 


243->249 


PS00294 


264->268 



ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S ITE 

PKC_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

C K2_PH0S PHO_S ITE 

CK2_PHOSPHO_SITE 

T YR_PHOS PHO_S ITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

PRENYLATION 



PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00266 



(No Pfam data available for DKFZphutel 2h3.2) 
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DKFEphmcfl_lall 



group: transmembrane protein 

DKFZpnmcf l_lall encodes a novel 393 amino acid protein with weak similarity to S.pombe 
SPBC29A3_3~protein and S. cerevisiae putative membrane protein YDR255c. 

The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of mammary carcinoma- 
specific genes and as a new marker for mammary carcinoma cells. 



similarity to YDR255c and SPBC29A3.03c 
membrane regions: 1 

Summary DKFZphmcf l_lall encodes a novel 393 amino acid protein, with 
similarity to YDR255c and SPBC29A3 . 03c . 



similarity to YDR255c and SPBC29A3.03c 

complete cDNA, complete cds, EST hits 

potential start at Bp 110 matches kozak consensus 

Sequenced by DKF2 

Locus: /map="542.7 cR from top of Chr5 linkage group" 
Insert length: 1819 bp 

Poly A stretch at pos. 1808, no polyadenylation signal found 



1 CCCGGCCCAG CCCCCGAAGA GCCGCCTCAG CCGGGGGGAG TTGCTCGGAC 
51 TCAAACGTCC AGTCCTCGTG CGACCGCGCT GGGTCGGAAG TGAGCAGGCT 
101 GAGGCCACCA TGGAGCAGTG TGCGTGCGTG GAGAGAGAGC TGGACAAGGT 
151 CCTGCAGAAG TTCCTGACCT ACGGGCAGCA CTGTGAGCGG AGCCTGGAGG 
201 AGCTGCTGCA CTACGTGGGC CAGCTGCGGG CTGAGCTGGC CAGCGCAGCC 
251 CTCCAGGGGA CCCCTCTCTC AGCCACCCTC TCTCTGGTGA TGTCACAGTG 
301 CTGCCGGAAG ATCAAAGATA CGGTGCAGAA ACTGGCTTCG GACCATAAGG 
351 ACATTCACAG CAGTGTATCC CGAGTGGGCA AAGCCATTGA CAGGAACTTC 
401 GACTCTGAGA TCTGTGGTGT TGTGTCAGAT GCGGTGTGGG ACGCGCGGGA 
451 ACAGCAGCAG CAGATCCTGC AGATGGCCAT CGTGGAACAC CTGTATCAGC 
501 AGGGCATGCT CAGCGTGGCC GAGGAGCTGT GCCAGGAATC AACGCTGAAT 
551 GTGGACTTGG ATTTCAAGCA GCCTTTCCTA GAGTTGAATC GAATCCTGGA 
601 AGCCCTGCAC GAACAAGACC TGGGTCCTGC GTTGGAATGG GCCGTCTCCC 
651 ACAGGCAGCG CCTGCTGGAA CTCAACAGCT CCCTGGAGTT CAAGCTGCAC 
701 CGACTGCACT TCATCCGCCT CTTGGCAGGA GGCCCCGCGA AGCAGCTGGA 
751 GGCCCTCAGC TATGCTCGGC ACTTCCAGCC CTTTGCTCGG CTGCACCAGC 
801 GGGAGATCCA GGTGATGATG GGCAGCCTGG TGTACCTGCG GCTGGGCTTG 
851 GAGAAGTCAC CCTACTGCCA CCTGCTGGAC AGCAGCCACT GGGCAGAGAT 
901 CTGTGAGACC TTTACCCGGG ACGCCTGTTC CCTGCTGGGG CTTTCTGTGG 
951 AGTCCCCCCT TAGCGTCAGC TTTGCCTCTG GCTGTGTGGC GCTGCCTGTG 
1001 TTGATGAACA TCAAGGCTGT GATTGAGCAG CGGCAGTGCA CTGGGGTCTG 
1051 GAATCACAAG GACGAGTTAC C GAT T GAG AT TGAACTAGGC ATGAAGTGCT 
1101 GGTACCACTC CGTGTTCGCT TGCCCCATCC TCCGCCAGCA GACGTCAGAT 
1151 TCCAACCCTC CCATCAAGCT CATCTGTGGC CATGTTATCT CCCGAGATGC 
1201 ACTCAATAAG CTCATTAATG GAGGAAAGCT GAAGTGTCCC TACTGTCCCA 
1251 TGGAGCAGAA CCCGGCAGAT GGGAAACGCA TCATATTCTG ATTCCTACCT 
1301 GGAAGGAATT TTGTTGAAAG GGGTTTTCAC CTGTGAGCCT TGGTCTGTCT 
1351 CGGTAGGGTG GTCAACTTCA GTGGACTGTG GTTGGTTTCA GAGCGCCTGG 
1401 CTGAGGAGTT CCACTGAGGG GAGCACTGGA GCAGCCCTTT GGCAGAGGCT 
1451 GAGGAGGGAG ATGGACCAGC CCACGCCTGG CACCTGGCTC CATGGCATAA 
1501 GGAAAGGGAG ATGCTGGCCT CTGTGCTCCT GCTGTCTTTT CCTGTTTCTG 
1551 TTTGCGTTTG ACTTAGTAGC AACCGACAGA GTGGCAAGGG ATTTGGTCTT 
1601 CAGCAGTAGA CATCCTTCCA CCCCTGCCCT CAGCCAAGTC TCTTGCTGCC 
1651 ATGCCAATGC TATGTCCACC CTTGCCCCTC GGCCCAAGAG TGTCCAGCGG 
1701 TGGCCCACCT CTTCCTCCCA CTACAGCCTC AACAGTATGT ACCATCTCCC 
1751 ACTGTAAATA GTCCCAGTTA GAACGGAATG CCGTTGTTTT ATAACTTTGA 
1801 ACAAATGTAA AAAAAAAAA 



BLAST Results 



Entry HS579359 from database EMBL: 
human STS WI-6350. 
Score = 1027, p « 9.9e-40, identities = 207/209 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 110 bp to 1288 bp; peptide length: 393 
Category: similarity to unknown protein 



1 MEQCACVERE LDKVLQKFLT YGQHCERSLE ELLHYVGQLR AELASAALQG 

51 TPLSATLSLV MSQCCRKIKD TVQKLASDHK DIHSSVSRVG KAIDRNFDSE 

101 ICGWSDAVW DAREQQQQIL QMAIVEHLYQ QGMLSVAEEL CQESTLNVDL 

151 DFKQPFLELN RILEALHEQD LGPALEWAVS HRQRLLELNS SLEFKLHRLH 

201 FIRLLAGGPA KQLEALSYAR HFQPFARLHQ REIQVMMGSL VYLRLGLEKS 

251 PYCHLLDSSH WAEICETFTR DACSLLGLSV ESPLSVSFAS GCVALPVLMN 

301 IKAVIEQRQC TGVWNHKDEL PIEIELGMKC WYHSVFACPI LRQQTSDSNP 

351 PIKLICGHVI SRDALNKLIN GGKLKCPYCP MEQNPADGKR IIF 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphmcf l_lall, frame 2 

TREMBL:SPBC29A3_3 gene: "SPBC29A3 .03c"; product: "hypothetical 
protein"; S.pombe chromosome II cosmid c29A3., N = 2, Score = 302, P - 
3.4e-42 

PIR:S67312 probable membrane protein YDR255c - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 271, p = 5.3e-22 

TREMBL : CET07D1_2 gene: "T07D1.2"; Caenorhabditis elegans cosmid 
T07D1., N = 1 , Score = 193, P = 5.6e-13 



>TREMBL:SPBC29A3_3 gene: "SPBC29A3 . 03c"; product: "hypothetical protein"; 
S.pombe chromosome II cosmid c29A3. 
Length « 398 

HSPs: 



Score = 302 (45.3 bits), Expect = 3.4e-42, Sura P(2) = 3.4e-42 
Identities = 55/142 (38%), Positives 89/142 (62%) 



Query: 


252 


YCHLLDSSHWAEICETFTRDACSLLGLSVESPLSVSFASGCVALPVLMNIKAVIEQRQCT 


311 






Y +LD W + F R+ C+ LG+S+ESPL + +G +ALP+L+ + ++++++ 




Sbjct: 


258 


YIDVLDLD-WKSLELLFVREFCAALGMSLESPLDIVVNAGAIALPILLKMSSIMKKKHTE 


316 


Query: 


312 


GVWNHKDELPIEIELGMKCWYHSVFACPILRQQTSDSNPPIKLICGHVISRDALNKLING 


371 






W + ELP+EI L +HSVF CP+ ++Q ++ NPP+ 4 CGHVI +++L +L 




Sbjct: 


317 


--WTSQGELPVEIFLPSSYHFHSVFTCPVSKEQATEENPPMMMSCGHVIVKESLRQLSRN 


374 


Query: 


372 


G--KLKCPYCPMEQNPADGKRI I F 393 








G + KCPYCP E AD R+ F 




Sbjct: 


375 


GSQRFKCPYCPNENVAADAIRVYF 398 




Score 


= 161 


(24.2 bits), Expect = 3.4e-42, Sum P(2) = 3.4e-42 




Identities = 51/221 (23%), Positives - 102/221 (46%) 




Query: 


22 


GQHCERSLEELLHYVGQLRAELASAALQGTPLSATLSLVMSQCCRKIKDTVQKLASDHKD 


81 






G C L EL +++L+P++LVCK+ L K 




Sbjct: 


15 


GNKCLAKLNEL ESILKDAKKSCLKD-PTTSMKELVA — CSEKTQQVFDDLKRTEKK 


67 


Query: 


82 


IHSSVSRVGKAIDRNFDSEICGWSDAVWDAREQQQQILQMAIVEHLYQQGMLSVAEELC 


141 






H+S++R GK +++ F+ ++ + + +++++++ + A+ H ++QG + +A C 




Sbjct: 


68 


FHTSLNRFGKTLEKKFNFDLEDIKLHSSFESKKRE— IDTALSLHFFRQGDVELAHLFC 


124 


Query: 


142 


QESTLNVDLDFKQPFLELNRILEALHEQDLGPALEWAVSHRQRLLELNSSLEFKLHRLHF 


201 






+E+ + + F L I + + + ++DL +EWA R L SSLE+ L + 




Sbjct: 


125 


KEAGIEEPSESLHVFTLLKSIVQGIRDKDLKLPIEWASQCRGYLERKGSSLEYTLQKYRL 


184 


Query: 


202 


IRLLAGGPAKQL-EALSYAR-HFQPFARLHQREIQVMMGSLVY 242 





+ K+A+YR+ F + H +IQ M +L + 
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Sbjct: 185 VSNYL — TTKDIMAAIRYCRTNMAEFQKKHLADIQKTMIALFF 225 
Pedant information for DKFZphmcf l__lall, frame 2 



Report for DKFZphmcf l_lall .2 



[LENGTH] 


393 




[MW] 


44414.77 




[pi] 


6.15 




(HOKOLJ 


TREMBL: SPBC29A3 3 gene: "SPBC29A3 . 030"; product: "hypothetical protein" 


S.pombe chromosome II cosmid c29A3. 


2e-39 


[FUNCAT] 


99 unclassified proteins [S. cerevisiae, YDR255c) 8e-23 


[PIRKW] 


transmembrane protein 2e-21 


[PROSITE] 


MYRISTYL 2 




[PROSITE] 


AMIDATION 1 




t PROSITE] 


CK2 PHOSPHO SITE 


3 


[PROSITE] 


PROKAR LIPOPROTEIN 


1 


[PROSITE] 


TYR PHOSPHO SITE 


3 


( PROSITE] 


PKC PHOSPHO SITE 


1 


[PROSITE] 


ASN GLYCOSYLATION 


1 


[KWJ 


TRANSMEMBRANE 1 





SEQ MEQCACVERELDKVLQKFLTYGQHCERSLEELLHYVGQLRAELASAALQGTPLSATLSLV 

PRD ccceeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhh 

MEM 

SEQ MSQCCRKIKDTVQKLASDHKDIHSSVSRVGKAIDRNFDSEICGVVSDAVWDAREQQQQIL 

PRD hhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhccccceeeechhhhhhhhhhhhhhh 

MEM 

SEQ QMAIVEHLYQQGMLSVAEELCQESTLNVDLDFKQPFLELNRILEALHEQDLGPALEWAVS 

PRD hhhhhhhhhhhccchhhhhhhhhhhccccccccchhhhhhhhhhhhhhccccchhhhhhh 

MEM 

SEQ HRQRLLELNSSLEFKLHRLH FI RLLAGGPAKQLEALS YARH FQPFARLHQRE I QVMMGSL 

PRD hhhhhhhcccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM 

SEQ VYLRLGLEKSPYCHLLDSSHWAEICETFTRDACSLLGLSVESPLSVSFASGCVALPVLMN 

PRD hhcccccccccccccccchhhhhhhhhhhhhhhhhhhhcccccceeeecccccchhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMM 

SEQ IKAVIEQRQCTGVWNHKDELPIEIELGMKCWYHSVFACPILRQQTSDSNPPIKLICGHVI 

PRD hhhhhhhhhhhcccccccccceeeeeccceeeeeeeecchhhhhccccccccccccceee 

MEM MMMMMM 

SEQ SRDALNKLINGGKLKCPYCPMEQNPADGKRIIF 

PRD eehhhhhhhccccccccccccccchhhhhcccc 

MEM 



Prosite for DKFZphmcf l_lall . 2 



PS00001 


189->193 


PS00005 


180->183 


PS00006 


28->32 


PS00006 


135->139 


PS00006 


190->194 


PS00007 


211->219 


PS00007 


27->36 


PS00007 


244->253 


PS00008 


37->43 


PS00008 


50->56 


PS00009 


387->391 


PS00013 


282->293 



ASN_GL YCOS YLAT I ON 

PKC_PH0SPHO_SITE 

CK2_PH0SPH0 SITE 

CK2_PH0SPH0~SITE 

CK2_PH0SPH0_SITE 

TYR_PHOSPHO_SITE 

TYR_PHOSPHO SITE 

TYR_PHOSPHO~SITE 

MYRISTYL 

MYRISTYL 

AMIDATION 

PROKAR LIPOPROTEIN 



PDOC00001 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 

poocooooa 

PDOC00008 
PDOC00009 
PDOC00013 



(No Pfam data available for DKFZphmcf l_lall .2) 
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DKFZphmcfl_lc23 

group: mammary carcinoma derived 

DKFZphmcf l_lc23. 1 encodes a novel 311 amino acid proline rich protein. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of mamma carcinoma- 
specific genes. 

unknown, proline rich protein 

complete cDNA, complete cds? potential start at Bp 50, EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 3077 bp 

Poly A stretch at pos . 3067, polyadenylation signal at pos. 3048 

1 AACTGGCCCC CTCCCCCACC CCCTGCCCCT GAGGAGCAGG ACCTGTCCAT 
51 GGCTGACTTC CCCCCACCAG AGGAGGCTTT TTTCTCTGTG GCCAGCCCTG 

101 AGCCTGCAGG CCCTTCAGGC TCCCCAGAGC TTGTCAGCTC CCCGGCTGCT 

151 TCGTCCTCCT CAGCTACTGC TTTGCAGATT CAGCCCCCGG GTAGCCCAGA 

201 CCCTCCTCCA GCTCCGCCAG CCCCAGCTCC TGCTAGTTCC GCCCCAGGGC 

251 ATGTGGCCAA GCTCCCTCAG AAGGAACCGG TGGGCTGTAG CAAGGGTGGT 

301 GGGCCTCCCA GGGAGGACGT AGGTGCGCCC CTGGTCACGC CCTCGCTCCT 

351 GCAGATGGTG CGGCTGCGCT CCGTGGGTGC TCCAGGAGGG GCTCCCACCC 

401 CAGCACTGGG GCCATCGGCC CCCCAGAAAC CACTGCGAAG GGCCCTGTCA 

451 GGGCGGGCCA GCCCAGTGCC TGCCCCCTCC TCAGGGCTCC ATGCTGCGGT 

501 CCGACTCAAG GCCTGCAGCC TGGCCGCCAG TGAAGGCCTC TCAAGTGCTC 

551 AGCCCAACGG ACCGCCTGAG GCAGAGCCAC GGCCTCCCCA GTCCCCTGCC 

601 TCAACGGCCA GTTTCATCTT CTCCAAGGGC TCTAGGAAGC TGCAGCTGGA 

651 GCGGCCCGTG TCCCCTGAGA CCCAGGCTGA CCTCCAGCGG AATCTGGTGG 

701 CAGAACTCCG GAGCATCTCA GAGCAGCGGC CACCCCAGGC CCCAAAGAAG 

751 TCACCTAAGG CTCCCCCACC TGTGGCCCGC AAGCCGTCTG TGGGAGTCCC 

801 CCCACCCGCC TCCCCCAGTT ACCCTCGAGC TGAGCCCCTT ACTGCTCCTC 

851 CCACCAATGG GCTCCCTCAC ACCCAGGACA GGACTAAGAG GGAGCTGGCG 

901 GAGAATGGAG GTGTCCTGCA GCTGGTGGGC CCAGAGGAGA AGATGGGCCT 

951 CCCGGGCTCA GACTCACAGA AAGAGCTGGC CTGACCACCA GGCACCTCAC 
1001 TGGCACTGCT GACCCATCCC AGAAACACAA TCTCAGGGAC CCGAGCAGCT 
1051 CCAAGGACGA GAGGATACAG CAGACACAAC CTAATAGAGA GGGCGCCTGC 
1101 AGCCTTAACC TCCACGGCCT TCGATACTTA TGCAAGCCTG GTGTTGCTCC 
1151 TGTCCTCAGA GTCATCCTGC GCTCATGCCT TTTCCCGAAT GGGTTCACCT 
1201 CTGGCAGTTG CCGCTTCAGT CTTGGCCTTA GCCTCATCTT GAAGTGGGTA 
1251 GCTGGCGGGA GAGGGTGGCT GCGCCCCCTG CTGGCCCTGA GGCTGCAGAG 
1301 TTGGGAGCAG GACACCTCAC CTGAGTTTCA TTTTTTTTCA TGTCCAAACC 
1351 ATGCACATAC TATAGTCCAG AATCAAAGCA CTTTTGAAAA GTGGCTGCAT 
1401 GGCCATCCTC CAGGGCCCAG GAAGTTGCAT TCCAAGGGCC TGTTTACATG 
1451 GCAGCAGAAT CCATCCCCGG CAGTCAGCCC ATAGCTTGGG ACCAGTCTGT 
1501 GCCCTCCTGC CCAGTCCAGT TTACTCCTCT TGGTTCCTGA AGGTGGCCAA 
1551 GTCATTGTGT TCCCACAGGC TTCTCTAGGC TGGGGGCAGG TGTGGGGCTG 
1601 TGGAATTCCA AAGCACAAAA GGTGCAGAGG GGATTGGCCT TCCTGTGCCT 
1651 CAACTCACCA ACCACCCTCC TGCCTTCCAG TTCTGCCAGG TGCTCCATGC 
1701 TGGGGACAAG TAGGAGACTG CCAGGGCCCA AAGAAATGGG TGAGCAGTAG 
1751 AGTCATCTCG GGGCACTTGG CAGTGTCAAG CACCTGCCCC TTGCCTCCTT 
1801 GACCACACTG GGGTGGGTGG GCCCCCAGCA CTTCAGAGGC AGGAGCCTTT 
1851 GGGCTGAGCA AGCACTGAGG AGGTGGATGG AAGGGAGCAT CTGGAGGGGG 
1901 GGAGCTTCCT TGAGCAGTGG GCCCAGGCCT GGCCCTCCAC ACTTCATTCT 
1951 CTGACCTTTC TCTCTCCTCA TTTCGGTGCA TGTCCTTTCT GCAGCTGCCT 
2001 TTCAGCACAG GTGGTTCCAC TGGGGGCAGC TAACGCTGAG TGACAAGGAT 
2051 GGGAAGCCAC AGGTGCATTT TACTCAAGTC TTCTCTAGTC AATGAGGGGC 
2101 ACCCAGTGCT TCTAGGGCAG GCTGGGTGGT GGTCCCCTAG GTATCAGCCT 
2151 CTCTTACTGT ACTCTCCGGG AATGTTAACC TTTCTATTTT CAGCCTGTGC 
2201 CACCTGTCTA GGCAAGCTGG CTTCCCCATT GGCCCCTGTG GGTCCACAGC 
2251 AGCGTGGCTG CCCCCCAGGG CCACCGCTTC TTTCTTGATC CTCTTTCCTT 
2301 AACAGTGACT TGGGCTTGAG TCTGGCAAGG AACCTTGCTT TTAGCTTCAC 
2351 CACCAAGGAG AGAGGTTGAC ATGACCTCCC CGCCCCCTCA CCAAGGCTGG 
2401 GAACAGAGGG GATGTGGTGA GAGCCAGGTT CCTCTGGCCC TCTCCAGGGT 
2451 GTTTTCCACT AGTCACTACT GTCTTCTCCT TGTAGCTAAT CAATCAATAT 
2501 TCTTCCCTTG CCTGTGGGCA GTGGAGAGTG CTGCTGGGTG TACGCTGCAC 
2551 CTGCCCACTG AGTTGGGGAA AGAGGATAAT CAGTGAGCAC TGTTCTGCTC 
2601 AGAGCTCCTG ATCTACCCCA CCCCCTAGGA TCCAGGACTG GGTCAAAGCT 
2651 GCATGAAACC AGGCCCTGGC AGCAACCTGG GAATGGCTGG AGGTGGGAGA 
2701 GAACCTGACT TCTCTTTCCC TCTCCCTCCT CCAACATTAC TGGAACTCTA 
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2751 TCCTGTTAGG ATCTTCTGAG CTTGTTTCCC TGCTGGGTGG GACAGAGGAC 
2801 AAAGGAGAAG GGAGGGTCTA GAAGAGGCAG CCCTTCTTTG TCCTCTGGGG 
2851 TAAATGAGCT TGACCTAGAG TAAATGGAGA GACCAAAAGC CTCTGATTTT 
2901 TAATTTCCAT AAAATGTTAG AAGTATATAT ATACATATAT ATATTTCTTT 
2951 AAATTTTTGA GTCTTTGATA TGTCTAAAAA TCCATTCCCT CTGCCCTGAA 
3001 GCCTGAGTGA GACACATGAA GAAAACTGTG TTTCATTTAA AGATGTTAAT 
3051 TAAATGATTG AAACTTGAAA AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 49 bp to 981 bp; peptide length: 311 
Category: putative protein 
Classification: unset 



1 MADFPPPEEA FFSVASPEPA GPSGSPELVS SPAASSSSAT ALQIQPPGSP 
51 DPPPAPPAPA PASSAPGHVA KLPQKEPVGC SKGGGPPRED VGAPLVTPSL 
101 LQMVRLRSVG APGGAPTPAL GPSAPQKPLR RALSGRASPV PAPSSGLHAA 
151 VRLKACSLAA SEGLSSAQPN GPPEAEPRPP QSPASTASFI FSKGSRKLQL 
201 ERPVSPETQA DLQRNLVAEL RSISEQRPPQ APKKSPKAPP PVARKPSVGV 
251 PPPASPSYPR AEPLTAPPTN GLPHTQDRTK RELAENGGVL QLVGPEEKMG 
301 LPGSDSQKEL A 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphmcf l_lc23, frame 1 

PIR:S49915 extensin-like protein - maize, N = 1, Score 13 215, P * 
6.1e-15 

PIR:A28996 proline-rich protein M14 precursor - mouse, N = 1, Score 
191, P = 3.8e-13 



>PIR:S49915 extensin-like protein - maize 
Length = 1,188 

HSPs: 

Score = 215 (32.3 bits), Expect = 6.1e-15, P « 6.1e-15 
Identities - 81/269 (30%), Positives = 115/269 (42%) 

Query: 5 PPPEEAFFS VASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSP— DPPP A 55 

PPP S V SP P P SP PA +SS ++ PP +P PPP + 

Sbjct: 598 PPPPAPVASPPPPVKSPPPPTPVASPP PPAPVASSPPPMKSPPPPTPVSSPPPPEKS 654 

Query: 56 PPAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 

PPPPASP +P P K PP + +P + PS + P 

Sbjct: 655 PPPPPPAKSTPPP-EEYPT— PPTSVKSSPPPEKSLPPPTLIPSPPPQEKPTPPSTPSKP 711 

Query: 116 PTPALGPSAPQKPLRRA-LSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPE 174 

P+ PS P+ + P+ + ++SP PAP S +LA S + + PP 

Sbjct: 712 PSSPEKPSPPKEPVSSPPQTPKSSPPPAPVSSPPPTPVSSPPALAPVSSPPSVKSSPPPA 771 

Query: 175 AEPRPPQSPASTASFIFSKGSRKLQLERPV-SPETQADLQRNLVAELRSISEQRPPQAPK 233 

PP +p +S +Q+ P +P++ L V+ + + PP AP 

Sbjct: 772 PLSSPPPAPQVKSS PPPVQVSSPPPAPKSSPPLAP— VSSPPQVEKTSPPPAPL 823 

Query: 234 KSPKAPP PVARKPSVGV — PPPASPSYPRAEPLTAPPTNGLP 273 

SP P + P V V PPP S P P+++PP P 
Sbjct: 824 SSPPLAPK-SSPPHVVVSSPPPWKSSPPPAPVSSPPLTPKP 864 

Score » 206 (30.9 bits), Expect = 9.1e-14, P - 9.1e-14 
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Identities « 82/261 (31%), Positives ■ 108/261 (41%) 



Query : 


17 


PEPAG-PSGSPELVSSPAASS SSATALQIQPPGSPDPPPAP PAPAPASSAPGHV 


69 






P P G P SP + PAAS+ ST + P P+P P P P P P +P 




Sbjct: 


410 


PTPGGGPPSSP-VPGKPAASAPMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHSPPAD 


468 


Query: 


70 


AKLPQKEPV-GCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPSAPQKP 


128 






+P PV G S P V P + +V+L AP G+P P + ++P P 




Sbjct: 


469 


DYVPPTPPVPGKSPPATSPSPQVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTSPPAP 


528 


Query: 


129 


LRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQS PASTAS 


188 




+ G SP P P S + +K+ AG + P PPE P PP AS 




Sbjct: 


529 


I GSPSP-PPPVSWSPPPPVKSPPPPAPVG SPP— PPEKSPPPPAPVASPPP 


577 


Query: 


189 


FIFSKGSRKLQLERPVS PETQADLQRNLVAELRS I SEQRPPQAPKKSPKAPPPVARK PS - 


247 






+ S L P P ++ VA + PP P SP P PVA P 




Sbjct: 


578 


PVKSPPPPTLVASPP--PPVKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVASSPPP 


635 



Query: 248 VGVPPP ASPSYPRAEPLTAPPTNGLPHTQD 277 

+ PPP +SP P P PP P ++ 
Sbjct: 636 MKSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEE 669 



Score ~ 202 (30.3 bits), Expect « 2.9e-13, P = 2.9e-13 
Identities => 81/254 (31%), Positives « 110/254 (43%) 



Query: 16 SPEPAGPSGSPELV— SSP--AASSSSATALQIQPPGSP-DPPPAPPAPAPASSAPGHVA 70 

SP PA P SP L SSP SS ++ PP +P PP P PA S P HV+ 
Sbjct: 817 SPPPA-PLSSPPLAPKSSPPHVVVSSPPPVVKSSPPPAPVSSPPLTPKPA SPPAHVS 872 

Query: 71 KLPQ K E P VGC S KGGG P P RE D VG A P LVT P S LLQMV RL RS VG APGG A PT P ALGP S A PQ 126 

P+ P + PP E +P TP L ++S P +P + P + 

Sbjct: 873 SPPEVVKPSTPPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKSSP 932 

Query: 127 KPLRRAL— SGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPKGPPEAEPRPPQSP 183 

P+ + + ++SP PAP S A K+ A L P PPE + PP +P 
Sbjct: 933 PPVVVSSPPPTVKSSPPPAPVSSPPATP--KSSPPPAPVNL P— PPEVKSSPPPTP 984 

Query: 184 ASTASFI FS KGSRKLQLERPVS PETQADLQRNLVAELRS I SEQRPPQAPKKSPKAPP PVA 243 

S+- + P PE ++ V+ + PP AP SP PPPV 

Sbjct: 985 VSSPPPAPKSSPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVKSPPPPAPVSSP — PPPVK 1042 

Query: 244 RKPS VGVPPPASPSYPRAEPLTAPP 268 

P V PPP S P P+++PP 
Sbjct: 1043 SPPPPAPVSSPPPPVKSPPPPAPISSPP 1070 



Score - 190 (28.5 bits), Expect « 7.9e-12, P = 7.9e-12 
Identities - 74/264 (28%), Positives - 111/264 (42%) 



Query: 5 PPPEEAFFSVASPEPAGPSGSPELVSSPAAS-SSSATALQIQPPGSPDPPPAPPAPAPAS 63 

PPP S PE + P P +P + T+++ PP PP P+P 

Sbjct: 639 PPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLPPPTLIPSPPP 698 

Query: 64 SAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPS 123 

P K P K PP+E V +P TP V +P PTP P 

Sbjct: 699 QEKPTPPSTPSKPPSSPEKPS-PPKEPVSSPPQTPK—SSPPPAPVSSP— PPTPVSSPP 753 

Query: 124 APQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQSP 183 

A P+ S ++SP PAP S A ++K+ + + + P PP + PP +P 
Sbjct: 754 A-LAPVSSPPSVKSSPPPAPLSSPPPAPQVKS SPPPVQVSSP— PPAPKSSPPLAP 806 

Query: 184 AST AS FIFSKGSRKLQLERP-VS PETQADLQRNLVAELRS I SEQRPPQAPKKSPKAPPPV 242 

S+ + LP ++P++ +V+ f + PP AP SP P 

Sbjct: 807 VSSPPQVEKTSPPPAPLSSPPLAPKSSPP--HWVSSPPPVVKSSPPPAPVSSPPLTPKP 864 

Query: 243 ARKPS-VGVPP PASPSYPR AEPLTAPP 268 

A P+ V PP P++P P +EP ++PP 

Sbjct: 865 ASPPAHVSSPPEWKPSTPPAPTTVISPPSEPKSSPP 901 



Score « 189 (28.4 bits). Expect = 1.0e-ll, P = 1.0e-ll 
Identities = 86/271 (31%), Positives = 112/271 (41%) 



Query: 5 PPPEEAFFSVASPEPAGPSGSPEL-VSSP — AASSSSATALQIQPPG--SPDPPPAP 56 

PPP AS P P S P + VSSP A SS A PP PPPAP 
Sbjct: 768 PPP — APLSSPPPAPQVKSSPPPVQVSSPPPAPKSSPPLAPVSSPPQVEKTSPPPAPLSS 825 

Query: 57 PAP APASSAPGHVAKLPQKEP VGC S KGGG PPREDVGAPLVTPSLLQMVRLRSVGAPGGAP 116 

P AP SS P V P PV S PP V +P +TP V +P 

Sbjct: 826 PPLAPKSSPPHVWSSPP--PWKSS PPPAPVSSPPLTPKPASPPA — HVSSPPEVV 878 

Query: 117 TPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKAC-SLAASEGL— -SSAQP— 169 
P+ P AP + ++SP P P S V+ ++ +S + SS P 



548 



WO 01/12659 



PCT/IB00/01496 



Sbjct: 879 KPST-PPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKSSPPPVVV 937 

Query: 170 -NGPPEAEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRP 228 

+ PP + PP +P S+ + P PE ++ V+ + P 

Sbjct: 938 SSPPPTVKSSPPPAPVSSPPATPKSSPPPAPVNLP-PPEVKSSPPPTPVSSPPPAPKSSP 996 

Query: 229 PQAPKKSPKAPPPVARKPS VGVPPPASPSYPRAEPLTAPP 268 

P AP SP PPP + P V PPP S P P+++PP 

Sbjct: 997 PPAPMSSP — PPPEVKSPPPPAPVSSPPPPVKSPPPPAPVSSPP 1038 

Score = 181 (27.2 bits), Expect - 8.8e-ll, P - 8.8e-ll 
Identities * 73/277 (26%), Positives - 105/277 (37%) 

Query: 3 DFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPP GSPDPP PA 55 

D+ PP V PS SP+ V PAAS+ + +++ PP GSP PP + 

Sbjct: 469 DYVPPTPP VPGKSPPATSPSPQ-VQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 524 

Query: 56 PPAPAPASSAPGHVAKL PQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGA 111 

PPAP + SPV++ PKP + GPP+ P P ++S 
Sbjct: 525 PPAPIGSPSPPPPVSVVSPPPPVKSPPPPAPVGSPPPPEKSPPPPAPVASPPPPVKSPPP 584 

Query: 112 PG — GAPTPALGPSAPQKPLRRA LSGRASPVPAPSSGLHAAVRLKACSLAASEGLSS 166 

P +PP+ PP+ + PPS AV ++ + 

Sbjct: 585 PTLVASPPPPVKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVASSPPPMKSPPPPTP 64 4 

Query: 167 AQPNGPPEAEPRPPQS PASTAS FIFSKGSRKLQLERPVSPETQADLQRNLVAELRS I SEQ 226 

PPE P PP PA + + ++ PE L+ + 

Sbjct: 645 VSSPPPPEKSP-PPPPPAKSTPFPEEYPTPPTSVKSSPPPEKSLP-PPTLIPSPPPQEKP 702 

Query: 227 RPPQAPKKSPKAPP-PVARKPSVGVPPPASPSYPRAEPLTAPP 2 68 

PP P K P +P P K V PP S P P+++PP 
Sbjct: 703 TPPSTPSKPPSSPEKPSPPKEPVSSPPQTPKSSPPPAPVSSPP 745 

Score = 177 (26.6 bits), Expect - 2.6e-10, P = 2.6e-10 
Identities = 78/264 (29%), Positives » 105/264 (39%) 

Query: 5 PPPEEAFFSVASPEPAGP SGSPELVSSPAASSSSATALQIQPPGSP--DPPPAP-- 56 

PPP +P+PA P S PE+V P+ + T I PP P PPP P 

Sbjct: 850 PPPAPVSSPPLTPKPASPPAHVSSPPEVVK-PSTPPAPTTV--ISPPSEPKSSPPPTPVS 906 

Query: 57 -PAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 

P P SS P + P P PP V +P P++ V +P 

Sbjct: 907 LPPPIVKSSPPPAMVSSPPMTPKS SPPPVVVSSP--PPTVKSSPPPAPVSSPPAT 959 

Query: 116 PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEA 175 

P + P+ ■ P ++SP PPS A + S +SS P PPE 

Sbjct: 960 PKSSPPPAPVNLPPPEV KSSPPPTPVSSPPPAPK SSPPPAPMSSP-P--PPEV 1009 

Query: 176 EPRPPQS PASTAS FIFSKGSRKLQLERPVSPETQADLQRNLVAELRS I SEQRPPQAPKKS 235 

+ PP +P S+ + p P ++ V+ + PP AP S 

Sbjct: 1010 KSPPPPAPVSSPPPPVKSPPPPAPVSSP-PPPVKSPPPPAPVSSPPPPVKSPPPPAPISS 1068 

Query: 236 PKAPPPVARKPS VGVPPPASPSYPRAEPLTAPP 268 

P PPPV P V PPP S P P+++PP 
Sbjct: 1069 P— PPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSPP 1102 

Score = 177 (26.6 bits), Expect = 2.6e-10, P » 2.6e-10 
Identities - 82/267 (30%), Positives - 110/267 (41%) 

Query: 17 PEPAG-PSGSPELVSSPAASS SSATALQIQPPGSPDPPPAP PAPAPASSAPGHV 69 

P P G P SP + PAAS+ ST + P P+P P P PPP +P 
Sbjct: 410 PTPGGGPPSSP-VPGKPAASAPMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHSPPAD 4 68 

Query: 70 AKLPQKEPV-GCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPSAPQKP 128 

+P PV G S P V P + +V+L AP G+P P + ++P P 

Sbjct: 4 69 DYVPPTPPVPGKSPPATSPSPQVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTSPPAP 528 

Query: 12 9 LRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQS PASTAS 188 

+ G SP P P S + +K+ AG + P PPE P PP AS 
Sbjct: 529 I GSPSP-PPPVSVVSPPPPVKSPPPPAPVG SPP— PPEKSPPPPAPVASPPP 577 

Query: 189 FIFSKGSRKLQLERPV SPETQADLQRNLVAELRS ISEQRPPQA PK 233 

+ S L P SPA+ + ++S ++ PP P 

Sbjct: 578 PVKSPPPPTLVASPPPPVKSPPPPAPVA-SPPPPVKSPPPPTPVASPPPPAPVASSPPPM 636 

Query: 234 KSPKAPPPVARKP— SVGVPPPASPSYPRAEPLTAPPTN 270 

KSP P PV+ P PPP + S P E PPT+ 

Sbjct: 637 KSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTS 676 

Score = 170 (25.5 bits), Expect = 1.6e-09, P « 1.6e-09 
Identities = 78/279 (27%), Positives = 108/279 (38%) 
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Query: 5 PPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSPDPPPAPPAPAPASS 64 

PP S S + P +P + P SS A+ PP +P + PP P SS 

Sbjct: 883 PPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKS--SPP-PVVVSS 939 

Query: 65 APGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPG — GAPTPALGP 122 

P V P PV PP +P PL ++S P +P PA 

Sbjct: 940 PPPTVKSSPPPAPVS SPPATPKSSPPPAPVNLPPPEVKSSPPPTPVSSPPPAPKS 994 

Query: 123 SAPQKPLRRALSG — RASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPP 180 

S P P+ ++ P PAP S V+ S +SS P PP + PP 

Sbjct: 995 SPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVK SPPPPAPVSS— P--PPPVKSPPP 1046 

Query: 181 QSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPP 240 

+P S+ + P P ++ V+ + PP AP SP PP 

Sbjct: 1047 PAPVSSPPPPVKSPPPPAPISSP-PPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSP — PP 1103 

Query: 241 PVARKPS— VGVPPPAS PSYPRAEPLTAPPTNGLPHTQDRTKREL 283 

P+ P V PPPA PS P P+++PP P + ++ L 
Sbjct: 1104 PIKSPPPPAPVSSPPPAPVKPPSLPPPAPVSSPPPVVTPAPPKKEEQSL 1152 

Score = 169 (25.4 bits), Expect = 2.1e-09, P = 2.1e-09 
Identities = 75/266 (28%), Positives - 104/266 (39%) 

Query: 3 DFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPP GSPDPP PA 55 

D+ PP V PS SP+ V PAAS+ + +++ PP GSP PP + 

Sbjct; 469 DYVPPTPP— VPGKSPPATSPSPQ-VQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 524 

Query: 56 PPAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 

PPAP + S P V+ + PV PP VG+P P V +P 
Sbjct: 525 PPAPIGSPSPPPPVSVVSPPPPVKSP PPPAPVGSP— PPPEKSPPPPAPVASP 575 

Query: 116 PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEA 175 

P P P P ++ P PAP + V+ S ++S P P + 

Sbjct: 576 PPPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPPVK SPPPPTPVASPPPPAPVAS 631 

Query: 176 EPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKS 235 

P P +SP K P P S+ PP+ 

Sbjct: 632 SPPPMKSPPPPTPVSSPPPPEKSP— PPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLPP 689 

Query: 236 PK APPPVARK--PSVGVPPPASPSYPRA--EPLTAPP 268 

P +PPP + PS PP+SP P EP+++PP 
Sbjct: 690 PTLIPSPPPQEKPTPPSTPSKPPSSPEKPSPPKEPVSSPP 729 

Score - 168 (25.2 bits), Expect « 2.76-09, P - 2.7e-09 
Identities = 75/267 (28%), Positives * 102/267 (38%) 

Query: 2 ADFPPPEEAFFSVASPE-PAGPSGSPELVSSPAASSSSATALQIQPPGSPDPP-PAPPAP 59 

A PPP + ++ P+ P G P +SP A S + SP PP +PP P 

Sbjct: 496 ASTPPP— SLVKLSPPQAPVGSPPPPVKTTSPPAPIGSPSPPPPVSVVSPPPPVKSPPPP 553 

Query: 60 APASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPA 119 

AP S P P PV PP + P + S V+ AP +P P 

Sbjct: 554 APVGSPPPPEKSPPPPAPVASPP PPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPP 610 

Query: 120 LGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSL-AASEGLSSAQPNGPPEAEPR 178 

+ P P+ + P PAP + ++ +S P PP A+ 

Sbjct: 611 VKSPPPPTPVA SPPPPAPVASSPPPMKSPPPPTPVSSPPPPEKSPPPPPPAKST 664 

Query: 179 PP—QS PASTAS FIFSKGSRKLQLERPV SPETQADLQRNLVAELRSISEQRPPQAPK 233 

PP+PSSKLPSPQ S ++P +P 

Sbjct: 665 PPPEEYPTPPTSVKSSPPPEK-SLPPPTLIPSPPPQEKPTPPSTPSKPPSSPEKP—SPP 721 

Query: 234 KS PKAPPPVARKPS VGVPPPAS PS YPRAEPLTAPP 268 

K P + PP K S PPPA S P P+++PP 
Sbjct: 722 KEPVSSPPQTPKSS PPPAPVSSPPPTPVSSPP 753 

Score = 166 (24.9 bits), Expect = 4.6e-09, P - 4.6e-09 
Identities = 81/268 (30%), Positives = 108/268 (40%) 

Query: 5 PPPEEAF FSVASPEPAGPSGSPE-LVSSPAASSSS ATALQIQPPGSPDPPP-- 54 

PPPE++ VASP P S P LV+SP S A PP PPP 

Sbjct: 560 PPPEKSPPPPAPVASPPPPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPPVKSPPPPTP 619 

Query: 55 — APPAPAPASSAPGHVAKLPQKEPVGC SKGGGPPREDVGAPLVTPSLLQMVRLRS 108 

+PP PAP +S+P + P PV K PP P ++S 

Sbjct: 620 VASPPPPAPVASSPPPMKSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTSVKS 679 

Query: 109 VGAPGGA-PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSA 167 

P + PPLPSP P + + ++P PSS + + S SS 
Sbjct: 680 SPPPEKSLPPPTLIPSPP— PQEKP-TPPSTPSKPPSSPEKPSPPKEPVSSPPQTPKSSP 736 
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